Updated

Updated · Futurism · Jun 27

AI Contractors Use Chatbots to Make Training Data as Data Demand Doubles Every 9 Months

Updated

Updated · Futurism · Jun 27

AI Contractors Use Chatbots to Make Training Data as Data Demand Doubles Every 9 Months

3 articles · Updated · Futurism · Jun 27

Workers hired to create fresh AI training data are widely using other chatbots to produce that material, according to multiple contractors interviewed by New Scientist.
Low-paid, short-term contracts are driving the shortcut: workers said LLMs help them avoid mistakes, keep gigs and finish hyper-specific tasks faster, while simple edits can mask obvious chatbot phrasing.
Companies already issue rules against the practice and try to detect it, but contractors said enforcement is weak and only the most obvious AI-generated submissions tend to be caught.
The workaround lands as training-data use has doubled every nine months since 2010 and supplies of clean, original data are tightening, pushing firms to pay humans to generate new inputs.
Experts have long warned that feeding AI-generated material back into models can destabilize large language models, raising risks for the broader AI race.

Sources

Center100%

Futurism1d ago

AI Workers Use Other Chatbots to Generate Training Data Due to Poor Contracts

Financial Times15h ago

Robots, not chatbots, will realise AI's potential

Facebook1d ago

Workers hired to train next-generation AI models are reportedly using AI chatbots like ChatGPT to generate the conversations and evaluation data they are supposed to create themselves, according to a report by

5 Sources

Are tech companies creating a workforce now forced to poison the very AI systems that are replacing them?

As AI models begin to feed on their own output, is the industry spiraling toward an inevitable 'model collapse'?

With the EU's AI Act now in force, can new regulations stop the hidden tide of AI-generated data?

The Data Quality Crisis in AI: Preventing Model Collapse and Ensuring Human-Verified Training Pipelines

Overview

AI training pipelines are facing immediate challenges as the distinction between human-generated and AI-generated content becomes increasingly blurred. The widespread use of chatbots by contractors to create training data—known as 'AI cannibalism'—raises serious concerns about data quality and reliability. Detecting and preventing this practice is difficult, especially as cognitive automation becomes more common in documentation tasks. As a result, professionals need new methods to interpret and validate AI-generated content, since current AI models often struggle with understanding human language nuances. These issues highlight the urgent need for better oversight and innovative solutions to maintain the integrity of AI systems.

...