MIT Researchers Show Policy Gradients Beat Specialized Algorithms in 5 Hidden-Information Games
Updated
Updated · MIT News · Jun 17
MIT Researchers Show Policy Gradients Beat Specialized Algorithms in 5 Hidden-Information Games
1 articles · Updated · MIT News · Jun 17
Summary
Five imperfect-information games tested by MIT-led researchers showed neural networks trained with policy gradient methods achieved lower exploitability scores than specialized game-theoretic algorithms, then also won head-to-head matchups.
30 billion states in the largest games made rigorous evaluation difficult, so the team built a benchmark that scales exploitability analysis far beyond prior studies, which typically handled games about 100,000 times smaller.
A single line of code adds the benchmark to OpenSpiel, and the software is free to run on an ordinary laptop rather than a supercomputer.
The April ICLR paper challenges a long-held view that game-theory-specific methods should dominate two-player zero-sum settings with hidden information such as poker, bidding, trading, negotiations and military planning.