MIT Sloan Researchers Model AGI Verification Gap as AI Coding Accuracy Jumps to 71.7%
Updated
Updated · MIT Sloan News · Jun 11
MIT Sloan Researchers Model AGI Verification Gap as AI Coding Accuracy Jumps to 71.7%
1 articles · Updated · MIT Sloan News · Jun 11
Summary
A new paper by Xiang Hui and Jane Wu argues AGI’s economic payoff will be limited by a widening verification gap: AI can produce work cheaply, but checking whether it is safe, correct or complete remains slow and scarce.
SWE-bench results cited in the paper show AI coding accuracy rising from 4.4% to 71.7% in a year, while task length is doubling quickly, outpacing human reviewers’ time and experience.
The researchers say firms will compete less on deploying AI than on underwriting its risks, warning that using AI to verify AI can replicate the same errors and create false confidence.
Employment among younger workers in AI-exposed roles has already fallen about 16%, the paper says, eroding the entry-level training pipeline needed to build future verification skills.
For companies and policymakers, the prescription is to scale automation only as fast as it can be trusted, with stronger monitoring, accountability and human oversight built into deployment.
With AI erasing junior jobs, where will tomorrow's expert human verifiers come from?
Is the true business of AI not software, but underwriting the massive risks it creates?
The AI Coding Verification Gap: Systemic Risks, Technical Debt, and the Urgent Need for Robust Oversight
Overview
The rapid rise of AI in coding has led to significant breakthroughs in coding accuracy, with AI agents now achieving notable results on benchmarks like SWE-bench. As AI systems gain the ability to execute more complex and higher-risk tasks with greater autonomy, a critical 'verification gap' has emerged. This gap makes it much harder and more time-consuming to verify if AI-generated solutions are truly correct, increasing the risk of misplaced trust. The report highlights how this shift in software development demands new strategies to ensure reliability and maintain trust as AI's role continues to expand.