Updated
Updated · KDnuggets · Jun 8
Study Finds 19 LLMs Corrupt Up to 25% of Documents After 20 Editing Rounds
Updated
Updated · KDnuggets · Jun 8

Study Finds 19 LLMs Corrupt Up to 25% of Documents After 20 Editing Rounds

1 articles · Updated · KDnuggets · Jun 8

Summary

  • A DELEGATE-52 benchmark spanning 52 professional domains found even top models such as Gemini Pro, Claude Opus and GPT-5 failed to preserve documents during delegated editing, corrupting about 25% of content after 20 round-trip interactions.
  • The test asked models to make an edit and then reverse it; instead of restoring the original file, errors compounded over repeated turns, turning small changes into broader structural decay.
  • Weaker models mainly deleted material, while frontier systems more often kept the document's shape and word count but silently altered facts with plausible-sounding replacements that were harder to spot.
  • Corruption worsened with longer documents, extra distractor files and less structured domains, while code-like tasks held up better than natural-language or niche formatting work.
  • Agentic tools such as file access or code execution did little to fix the problem, underscoring the need for stronger verification before using LLMs as unsupervised document editors.

Insights

If even top AIs silently corrupt documents, are we building tools we can never truly trust?
The smartest AIs create the hardest-to-spot errors. How can you find the corruption hiding in plain sight?

Microsoft DELEGATE-52 Benchmark Exposes Severe LLM Degradation: Up to 50% Content Corruption in Extended Editing Tasks

Overview

The Microsoft Research DELEGATE-52 study, published in April 2026, delivers a stark wake-up call about the reliability of Large Language Models (LLMs) in long-term document editing. By evaluating 19 LLMs across 52 professional domains with real-world, lengthy documents and simulating 20 iterative editing interactions, the study reveals that even advanced AI tools struggle to maintain document integrity over time. These findings highlight significant risks for enterprises relying on LLMs for complex workflows, as errors accumulate and subtle, hard-to-detect changes can occur. The report underscores the urgent need for human oversight when using AI in critical document management.

...