Updated
Updated · Futurism · Jun 27
AI Contractors Use Chatbots to Make Training Data as Data Demand Doubles Every 9 Months
Updated
Updated · Futurism · Jun 27

AI Contractors Use Chatbots to Make Training Data as Data Demand Doubles Every 9 Months

3 articles · Updated · Futurism · Jun 27

Summary

  • Workers hired to create fresh AI training data are widely using other chatbots to produce that material, according to multiple contractors interviewed by New Scientist.
  • Low-paid, short-term contracts are driving the shortcut: workers said LLMs help them avoid mistakes, keep gigs and finish hyper-specific tasks faster, while simple edits can mask obvious chatbot phrasing.
  • Companies already issue rules against the practice and try to detect it, but contractors said enforcement is weak and only the most obvious AI-generated submissions tend to be caught.
  • The workaround lands as training-data use has doubled every nine months since 2010 and supplies of clean, original data are tightening, pushing firms to pay humans to generate new inputs.
  • Experts have long warned that feeding AI-generated material back into models can destabilize large language models, raising risks for the broader AI race.

Insights

Are tech companies creating a workforce now forced to poison the very AI systems that are replacing them?
As AI models begin to feed on their own output, is the industry spiraling toward an inevitable 'model collapse'?
With the EU's AI Act now in force, can new regulations stop the hidden tide of AI-generated data?

The Data Quality Crisis in AI: Preventing Model Collapse and Ensuring Human-Verified Training Pipelines

Overview

AI training pipelines are facing immediate challenges as the distinction between human-generated and AI-generated content becomes increasingly blurred. The widespread use of chatbots by contractors to create training data—known as 'AI cannibalism'—raises serious concerns about data quality and reliability. Detecting and preventing this practice is difficult, especially as cognitive automation becomes more common in documentation tasks. As a result, professionals need new methods to interpret and validate AI-generated content, since current AI models often struggle with understanding human language nuances. These issues highlight the urgent need for better oversight and innovative solutions to maintain the integrity of AI systems.

...