Updated
Updated · MIT News · Jun 17
MIT Researchers Show Policy Gradients Beat Specialized Algorithms in 5 Hidden-Information Games
Updated
Updated · MIT News · Jun 17

MIT Researchers Show Policy Gradients Beat Specialized Algorithms in 5 Hidden-Information Games

1 articles · Updated · MIT News · Jun 17

Summary

  • Five imperfect-information games tested by MIT-led researchers showed neural networks trained with policy gradient methods achieved lower exploitability scores than specialized game-theoretic algorithms, then also won head-to-head matchups.
  • 30 billion states in the largest games made rigorous evaluation difficult, so the team built a benchmark that scales exploitability analysis far beyond prior studies, which typically handled games about 100,000 times smaller.
  • A single line of code adds the benchmark to OpenSpiel, and the software is free to run on an ordinary laptop rather than a supercomputer.
  • The April ICLR paper challenges a long-held view that game-theory-specific methods should dominate two-player zero-sum settings with hidden information such as poker, bidding, trading, negotiations and military planning.

Insights

If general AI now beats specialized game theory, are decades of strategic AI research suddenly obsolete?
If AI can now autonomously hack complex systems, what new defenses can counter these machine-speed cyberattacks?
As AI accelerates military decisions, how can we ensure meaningful human control remains possible at machine speed?