Microsoft Study Finds 19 LLMs Corrupt Documents, Losing Up to 50% Content in Complex Workflows
Updated
Updated · InfoWorld · May 13
Microsoft Study Finds 19 LLMs Corrupt Documents, Losing Up to 50% Content in Complex Workflows
6 articles · Updated · InfoWorld · May 13
Microsoft researchers said 19 LLMs tested on DELEGATE-52 were unreliable delegates, with frontier models losing about 25% of document content after 20 editing interactions and average degradation across all models reaching 50%.
310 work environments across 52 professional domains showed errors compounded over longer workflows, larger documents and distractor files, while Python was the only domain where most models were judged ready.
Analysts said the preprint is a warning about document integrity rather than a verdict against enterprise AI, arguing that repeated delegation can silently distort contracts, ledgers, codebases and other consequential records.
Enterprises were urged to add guardrails such as domain tuning, verification steps and human review, because stronger models can leave content looking intact while subtly making it wrong.
AI can write but struggles to preserve content. Is the next tech race to build an AI that can reliably edit?
As AI agents silently corrupt critical files, how can we prevent catastrophic errors before it's too late?
Silent Corruption: Microsoft’s 2026 DELEGATE-52 Benchmark Reveals 25% Document Degradation by Leading LLMs
Overview
A major Microsoft Research study released in early 2026 offers a sobering look at the reliability of Large Language Models (LLMs) in delegated workflows and knowledge work. Despite organizations investing 36 percent of their digital budgets in AI automation, the study finds that LLMs are unreliable delegates, especially outside specific domains like Python coding. While LLMs can handle certain tasks, their performance drops sharply in less common areas, exposing a critical gap between current capabilities and the high expectations for autonomous AI agents. The research urges users to supervise AI closely, as trusting LLMs with important work documents remains risky.