Four AI Systems Pass 7 of 10 Math Problems in First Proof Benchmark

1 articles · Updated · The Washington Post · Jun 14

Seven of 10 unpublished research problems received a passing AI solution in First Proof’s second benchmark, based on work from four systems graded by 30 mathematicians at Harvard.
The project used privately solved but unpublished problems to test AI under controlled conditions and give mathematicians a more transparent check on company claims about breakthrough performance.
First Proof said some answers were flawless or novel, one used a strategy that impressed referees, while other attempts failed or needed minor revisions.
The results land weeks after OpenAI said an internal model disproved an 80-year-old Erdős conjecture, intensifying debate over whether AI is a threat to mathematics or a powerful but limited tool.
Researchers behind the benchmark said models still lag humans in choosing worthwhile questions, setting broader agendas and failing gracefully when a proof attempt breaks down.

Sources

Left100%

The Washington Post5h ago

First Proof Initiative: Four AI Systems Pass Seven of Ten Math Problems

With AI solving problems humans can't, what is the future role for mathematicians?

As AI conquers math's biggest challenges, who will control the tools that define truth?

If an AI proves a theorem that no human can understand, is it still mathematical progress?

"First Proof Benchmark: How AI is Reshaping Mathematical Discovery and Human Collaboration (2026)"

Overview

The "First Proof" benchmark, launched in early 2026, marks a major step in measuring AI's true mathematical abilities. Designed by leading mathematicians, it uses real, unpublished research problems to test AI systems in a fair and challenging way. This approach ensures that AI is evaluated on novel and complex questions, not just recycled or artificial ones. While the benchmark shows that AI can solve some advanced problems, it also highlights key challenges—such as the difficulty of communicating with AI and the risk of machines producing many incorrect proofs. These findings reveal both the promise and the limits of AI in mathematics.

...

Sources

1 total

Left100%

The Washington Post5h ago

First Proof Initiative: Four AI Systems Pass Seven of Ten Math Problems

Four AI Systems Pass 7 of 10 Math Problems in First Proof Benchmark

Summary

Sources

Insights

"First Proof Benchmark: How AI is Reshaping Mathematical Discovery and Human Collaboration (2026)"

Overview

Related Stories

Sources

Related Stories