METR discusses AI evaluation and highlights Claude Opus 4.6 completing complex human tasks
Updated
Updated · Bloomberg · Apr 25
METR discusses AI evaluation and highlights Claude Opus 4.6 completing complex human tasks
7 articles · Updated · Bloomberg · Apr 25
METR President Chris Painter and technical staff member Joel Becker explain how Claude Opus 4.6 accomplished a task typically requiring nearly 12 hours of human effort.
They analyze the methodology behind METR's viral chart, which tracks the rapid advancement of AI capabilities and benchmarks autonomous model performance.
The discussion underscores concerns about AI's potential for recursive self-improvement and the importance of evaluating complex, autonomous tasks as AI technology progresses.
With 'AI brain fry' affecting 14% of workers, how must we redesign work for human well-being?
As AI capabilities double every three months, can human governance and safety measures possibly keep pace?
If AI reshapes over half of US jobs, what is the national plan for this massive workforce transformation?
Beyond coding, how do we measure an AI's capacity for wisdom and complex real-world judgment?
As AI automates entry-level roles, who will gain the experience needed for future human oversight?
With AI labs creating automated researchers, are we on the verge of a new scientific revolution?