Updated
Updated · MIT News · Jun 3
MIT, Harvard Lift Llama 4 Scout Battleship Win Rate to 82% at 1% of GPT-5 Cost
Updated
Updated · MIT News · Jun 3

MIT, Harvard Lift Llama 4 Scout Battleship Win Rate to 82% at 1% of GPT-5 Cost

1 articles · Updated · MIT News · Jun 3

Summary

  • Llama 4 Scout beat human Battleship players 82% of the time after MIT and Harvard researchers added a Monte Carlo inference method; before the changes, it won just 8% of games.
  • The gain came from teaching models to ask more informative questions in a language-based “Collaborative Battleship” test, targeting a core weakness in AI agents operating under uncertainty.
  • Researchers also improved answer accuracy by translating questions into Python-style verification steps, lifting smaller models’ spotter performance by 15% on average; GPT-4o-mini improved by nearly 30%.
  • The approach carried into “Guess Who?” as Llama 4 Scout rose from 30% to more than 72% success and GPT-4o from 62% to 90%, suggesting broader value for search-heavy tasks.
  • The team says the game remains a simplified test bed, but argues better question-asking could help AI agents tackle scientific discovery, coding and diagnosis-style problems with large search spaces.

Insights

Can a cheap AI that mastered a game truly rival giants like GPT-5 in high-stakes fields like medicine?
If AI learns to ask better questions, how do we prevent it from missing the one fatal detail a human wouldn't?
As AI drives scientific discovery, are we augmenting human intellect or starting to outsource our own curiosity?

Llama 4 Scout’s 82% Win Rate and 1% Cost vs GPT-5: Open-Source AI Breakthrough

Overview

On June 3, 2026, a collaborative team at MIT CSAIL and Harvard SEAS announced a dramatic leap in AI with Llama 4 Scout. This breakthrough sent ripples through the AI community, as Llama 4 Scout achieved unprecedented improvements in both capability and cost-efficiency. Most notably, its win rate in the game Battleship soared from 8% to 82%, showing a more than tenfold increase in strategic prowess and decision-making. These advancements highlight a significant step forward in how AI can learn and execute complex strategies, promising to redefine the future of AI agent development.

...