Updated
Updated · South China Morning Post · Jul 3
ByteDance Says AI Agents Double Learning Speed Every 3 Months on 134 Long Tasks
Updated
Updated · South China Morning Post · Jul 3

ByteDance Says AI Agents Double Learning Speed Every 3 Months on 134 Long Tasks

3 articles · Updated · South China Morning Post · Jul 3

Summary

  • ByteDance’s Seed AI team said AI agents can double their learning speed every three months when they keep interacting with real-world environments after deployment.
  • The result points to a new scaling law for agentic AI as developers look beyond simply adding more training data and computing power to improve models.
  • To test that post-deployment learning, the team built EdgeBench, a benchmark with 134 ultra-long-horizon tasks across software engineering, scientific discovery, mathematics and professional knowledge work.
  • Each EdgeBench task requires at least 12 hours of continuous agent operation, targeting a gap the researchers said remains poorly understood even as the industry shifts toward AI agents.

Insights

If AI can double its learning speed every three months, are we prepared for its rapid, autonomous evolution in the real world?
A small AI model recently beat one 10x its size. Does this prove the future of AI is smart, not just big?
With AI's costs straining global power grids, can this new scaling law finally end the industry's 'brute-force' era?

Measuring Long-Term AI Progress: EdgeBench, Log-Sigmoid Scaling, and the New Era of Continuous Learning and Oversight

Overview

ByteDance's EdgeBench is a new benchmark that evaluates how well AI agents learn and improve over long periods, not just at the start. Studies using EdgeBench have found that AI performance follows a predictable log-sigmoid curve during extended learning. This helps developers know when an AI agent is likely to hit a point of diminishing returns on a task. With this understanding, teams can allocate resources more efficiently and focus on areas where further gains are possible, making AI development smarter and more effective in real-world applications.

...