OpenAI Says 1,000-Plus Entrants Used AI Agents in 16 MB Parameter Golf Challenge

2 articles · Updated · OpenAI · May 12

More than 2,000 submissions from over 1,000 participants over eight weeks led OpenAI to frame Parameter Golf as an early test of how AI coding agents reshape research contests.
A 16 MB artifact cap and 10-minute training limit on 8×H100s pushed entrants toward optimizer tuning, quantization, test-time strategies and other unconventional modeling ideas.
The vast majority of submitters said they used agents, which lowered experimentation costs and widened access; RunPod’s $1 million compute sponsorship also helped broaden participation.
That agent-driven pace created review and attribution problems, with copied invalid ideas spreading quickly enough that OpenAI built a Codex-based triage bot to flag submissions during days with hundreds of entries.
OpenAI said the contest also worked as a talent-discovery tool and suggested open technical competitions may increasingly depend on managing powerful AI agents as much as judging model quality.

What unforeseen security threats emerge when thousands of AI coding agents are unleashed in a competitive environment?

As AI agents dominate coding, how will we measure and reward genuine human ingenuity in technical challenges?

Is this extreme focus on model compression signaling the end of the era of ever-larger AI models?