Anthropic Details Fable 5 Safeguards, Proposes 5-Level Jailbreak Severity Scale

3 articles · Updated · Anthropic · Jul 2

Anthropic said Fable 5 is now globally available again and published a fuller account of the cyber safeguards that accompany the model, plus an early draft framework for rating AI jailbreaks.
Four classifier categories underpin those safeguards: prohibited and high-risk dual-use requests are blocked, low-risk dual-use prompts are monitored and sometimes blocked, and benign defensive uses are generally allowed.
Anthropic said it widened Fable 5’s safety margin versus earlier models, using classifiers alongside access controls, safety training and offline monitoring to catch dangerous cyber behavior while still permitting defensive work such as secure coding and incident response.
The proposed Cyber Jailbreak Severity framework scores findings across four axes—capability gain, breadth, ease of weaponization and discoverability—then maps them to 5 bands from CJS-0 to CJS-4.
Anthropic called the framework a draft meant to support common language with governments and industry partners, and opened a HackerOne program for researchers to submit Fable 5 cyber jailbreaks.