A paper by Amazon Web Services (AWS) AI Lab and reported by The Byte found that a "shocking amount of the web" is already made up of poor-quality AI-generated and translated content.
They say that although the paper is yet to be peer-reviewed, but "shocking" feels like the right word." According to the study, over half — specifically, 57.1 percent — of all of the sentences on the internet have been translated into two or more other languages. The poor quality and staggering scale of these translations suggest that large language model (LLM) -powered AI models were used to both create and translate the material. The phenomenon is especially prominent in "lower-resource languages," or languages with less readily available data with which to more effectively train AI models.