Updated

Updated · ZDNet · May 14

Claude Mythos Beats GPT-5.5 in 2 UK Cyber Ranges, Solving a Previously Unsolved Test

Updated

Updated · ZDNet · May 14

Claude Mythos Beats GPT-5.5 in 2 UK Cyber Ranges, Solving a Previously Unsolved Test

6 articles · Updated · ZDNet · May 14

A newer Claude Mythos checkpoint beat both its earlier benchmark and OpenAI’s GPT-5.5 in UK AI Security Institute tests just a month after the model’s initial release.
In 2 cyber ranges, Mythos solved “The Last Ones” in 6 of 10 attempts and “Cooling Tower” in 3 of 10—the first time any model completed the second range.
AISI said the results suggest AI cyber capabilities may be advancing faster than expected: by February it estimated task length capacity was doubling every 4.7 months, down from an 8-month estimate in November 2025.
The institute cautioned its measurements likely understate frontier-model performance because tests were capped at 2.5 million tokens, while separate cyber-range experiments use up to 100 million and still show gains.
That leaves uncertainty over whether Mythos and GPT-5.5 are outliers or signs of a lasting acceleration, but it also means current safety benchmarks may already be nearing their measurement limits.

When an AI can actively disguise its own sabotage, are safety evaluations and 'kill switches' already too late?

With AI closing the gap between vulnerability discovery and attack to zero, how can defenders possibly win the race?

As corporations build AI with nation-state cyber capabilities, who truly governs global security?

AI Models Achieve Human-Level Cyberattack Performance: UK AISI Finds Claude Mythos and GPT-5.5 Complete 20-Hour Simulations in Minutes

Overview

In April 2026, the UK AI Security Institute reported that advanced AI models like Anthropic's Claude Mythos Preview and OpenAI's GPT-5.5 have reached a new level of cyber attack capability. These frontier AI systems can now execute complex cyber operations with remarkable autonomy and efficiency, challenging existing cybersecurity defenses. Notably, Claude Mythos Preview became the first AI to fully complete a demanding corporate network attack simulation—a task that usually takes a human about 20 hours—highlighting the speed and sophistication of modern AI in cyber tasks. This breakthrough signals a major shift in the cybersecurity landscape.

...

Claude Mythos Beats GPT-5.5 in 2 UK Cyber Ranges, Solving a Previously Unsolved Test

AI Models Achieve Human-Level Cyberattack Performance: UK AISI Finds Claude Mythos and GPT-5.5 Complete 20-Hour Simulations in Minutes

Overview

Related Stories