Researchers Tie 88% of AI Story Repetition to Alignment Training

3 articles · Updated · Gizmodo · Jun 11

20,000 stories generated by GPT-5.4 Mini, Claude Haiku 4.5 and Gemini 3.1 Flash-Lite showed 11 words recurring in 88% of outputs, with “Elias the lighthouse keeper” appearing in about two-thirds.
Cornell researchers said pretraining data did not explain the pattern and instead pointed to alignment and safety tuning that may have elevated “safe” characters while steering models away from copyrighted or adult-content material.
WildChat — an open dataset of millions of human-chatbot conversations — was cited as a likely source because it has been widely reused in training and fine-tuning across AI labs.
404 Media and software engineer Daniel May found the name has already spilled beyond chatbot prompts into fantasy books, ambient music listings and even a handbook on alternative cancer treatments.
The finding adds to evidence that generative AI falls into narrow creative grooves rather than producing broad originality, echoing prior research that image models repeatedly default to a small set of motifs.