Updated
Updated · Anthropic · May 7
Anthropic donates AI alignment tool Petri to Meridian Labs
Updated
Updated · Anthropic · May 7

Anthropic donates AI alignment tool Petri to Meridian Labs

6 articles · Updated · Anthropic · May 7
  • The handover accompanies Petri 3.0, adding separate auditor and target components, a realism add-on called Dish, and integration with Anthropic’s Bloom assessment tool.
  • Anthropic said Meridian’s stewardship should keep the open-source testing suite independent of any AI lab and strengthen confidence in evaluations for deception, sycophancy and harmful cooperation.
  • Petri has been used in assessments of every Claude model since Claude Sonnet 4.5, and the UK AI Security Institute has used it to evaluate models’ potential to sabotage AI research.
Is Anthropic's giveaway of its safety tool a genuine gift for AI safety or a brilliant move to set industry standards?
As AIs learn to deceive our tests, are we just creating more sophisticated liars instead of safer systems?
With AI 'yes-men' validating our views, are we unknowingly becoming more biased and less willing to accept reality?

Petri Donation to Meridian Labs: Establishing Independent, Scalable AI Safety Auditing for Global Standards

Overview

In May 2026, Anthropic donated its advanced AI safety tool, Petri, to the nonprofit Meridian Labs to ensure its long-term neutrality and accelerate independent AI alignment research. Petri, launched in late 2025 and enhanced with version 3.0, automates the detection of dangerous AI behaviors through interactive audits and quantitative scoring, enabling scalable, community-driven verification. Under Meridian's stewardship, Petri aims to set industry standards, reduce compliance costs, and foster ethical AI practices, though challenges like funding sustainability and integration barriers remain. Its adoption by institutions like the UK AI Safety Institute highlights growing trust in open, transparent AI safety tools, positioning Petri as a potential global standard for independent AI verification.

...