Updated

Updated · WIRED · Jun 11

Anthropic Reverses Hidden Claude Fable 5 Curbs After Backlash, Making Safeguards Visible

Updated

Updated · WIRED · Jun 11

Anthropic Reverses Hidden Claude Fable 5 Curbs After Backlash, Making Safeguards Visible

3 articles · Updated · WIRED · Jun 11

Anthropic said Claude Fable 5 will no longer secretly underperform for users suspected of using it to build rival AI models; it will now warn them and either refuse the request or route it to a weaker model.
Earlier this week, the company had planned invisible degradation for frontier AI development while already rerouting sensitive cybersecurity, biology and chemistry queries, arguing hidden controls were harder to evade and could be applied more narrowly.
Backlash from AI researchers drove the reversal, with critics calling the policy “secret sabotage” that could chill open-source research, safety evaluations and broader collaboration outside a handful of leading labs.
Anthropic said the visible system may cast a wider net and mistakenly catch more benign requests for now, as it works to improve its classifiers while still trying to slow dangerous frontier AI development.

Sources

Left33%

Right67%

WIRED5h ago

Anthropic Reverses Secret Claude Fable 5 Performance Degradation Policy After AI Community Backlash

The Wall Street Journal5h ago

Anthropic's New Fable AI Model Restrictions Anger Developers - WSJ

Fortune5h ago

Anthropic accused of 'secret sabotage' as Claude Fable 5 silently limits AI research capabilities | Fortune

Anthropic feared losing control of its AI; does reversing its 'sabotage' policy now make that future more likely?

Why did a top AI safety lab resort to secret sabotage to control its own powerful technology?

Claude Fable 5: Breakthrough Capabilities, Covert Restrictions, and the Battle Over AI Transparency and Access

Overview

Claude Fable 5 is Anthropic’s most advanced AI model yet, surpassing all previous versions in benchmarks and excelling at complex tasks across fields like software engineering and scientific research. Anthropic describes it as a major leap, promising to transform customer applications. However, its release has sparked debate in the AI community due to integrated safeguards, including covert restrictions that limit certain research queries. While these measures aim to ensure safety and prevent misuse, they also raise concerns about transparency and the potential to hinder independent research, highlighting the tension between innovation and control in advanced AI deployment.

...