Microsoft Launches ASSERT Framework to Test App-Specific AI Behavior, Turning Policies Into Scored Checks
Updated
Updated · TechCrunch · Jun 2
Microsoft Launches ASSERT Framework to Test App-Specific AI Behavior, Turning Policies Into Scored Checks
3 articles · Updated · TechCrunch · Jun 2
Microsoft on Tuesday released ASSERT, an open-source framework that converts natural-language goals and policies into application-specific AI tests developers can score and inspect.
ASSERT structures acceptable and unacceptable behaviors, generates scenarios and test cases, runs them against a target system, and logs intermediate actions and tool calls to pinpoint failures.
Developers can add system context, tools, and constraints so the framework checks product-specific rules—such as blocking external emails or limiting confidential data to C-level executives.
Microsoft said the tool addresses gaps left by broad model benchmarks and can be used during development, after deployment, and for continuous monitoring.
The launch fits a wider industry push toward repeatable AI evaluation and regression testing as groups including Stanford HELM, MLCommons AILuminate, and METR expand behavior benchmarks.
Can automated testing truly tame unpredictable AI, or does it create a false sense of security for businesses?
With AI testing tools on the rise, is the Subject Matter Expert now the most critical role in technology?
Closing the AI Governance Gap: Microsoft Launches ASSERT and ACS for Autonomous Agent Safety and Compliance
Overview
On June 2, 2026, Microsoft launched ASSERT and ACS, two frameworks designed to address the growing challenges in AI governance and boost operational confidence for autonomous agents. This launch comes at a critical time, as many organizations struggle to build a strong governance layer, leading to significant risk gaps in AI deployment. Teams often lose confidence in the 'operate layer' due to manual evaluation processes, lack of systematic improvement paths, and incomplete traces at the agent boundary. ASSERT and ACS aim to close these gaps by providing a structured, closed-loop system for evaluating, controlling, and continuously improving AI agents.