3 articles · Updated · The Cloudflare Blog · May 18
More than 50 Cloudflare repositories were scanned with Anthropic's Mythos Preview under Project Glasswing, and the company said the model could chain low-severity flaws into working exploits and generate proof-of-concept code.
Mythos Preview stood out less for spotting isolated bugs than for stitching attack primitives together, compiling and running exploit code, and iterating on failures until exploitability was proven.
Cloudflare said the model's built-in refusals were inconsistent—sometimes blocking legitimate research, sometimes allowing the same task when phrased differently—so emergent guardrails are not enough for broader release.
Around 50 parallel hunting agents in Cloudflare's custom harness improved coverage and triage quality, while generic coding agents aimed at whole repositories produced too much noise and too little meaningful coverage.
Cloudflare said the advance will pressure defenders beyond simply patching faster, arguing security teams need architectural defenses that limit exploitability as capable cyber models spread.
As AI chains minor bugs into major breaches, can defensive architecture evolve faster than AI-powered attackers?
With AI generating most code, how do we secure systems when humans are no longer the primary authors or diligent reviewers?
Claude Mythos Preview and Project Glasswing: Industry Response to a 255% Spike in AI Vulnerabilities
Overview
The cybersecurity landscape is undergoing a profound transformation with the rise of advanced AI, such as Anthropic's Claude Mythos Preview. This unreleased large language model is anticipated to autonomously identify and chain together complex software vulnerabilities, marking a significant leap toward proactive security. Cloudflare has developed a sophisticated vulnerability discovery harness to test these AI capabilities in real-world environments, scanning live code across its critical infrastructure. Together, these innovations promise to move organizations beyond traditional, human-centric threat detection, redefining how vulnerabilities are discovered and mitigated in increasingly complex digital systems.