Updated
Updated · InfoWorld · Jun 11
Microsoft Open-Sources ASSERT for AI Agents, Citing 99% of Firms Skip Pre-Production Evaluation
Updated
Updated · InfoWorld · Jun 11

Microsoft Open-Sources ASSERT for AI Agents, Citing 99% of Firms Skip Pre-Production Evaluation

3 articles · Updated · InfoWorld · Jun 11

Summary

  • Microsoft released ASSERT under an MIT license, a framework that turns natural-language requirements, governance documents and product specs into executable tests, datasets, metrics and scorecards for enterprise AI agents.
  • The tool targets a gap in agent deployment: Gartner says 99% of organizations do not evaluate AI agents before production, even as failures can include policy drift, unsafe edge-case outputs and behavior changes after launch.
  • Microsoft said ASSERT uses LLMs as judges, with internal validation showing 80% to 90% agreement with human reviewers, enough to automate much testing but not replace human oversight in regulated or high-risk cases.
  • More than 45% of organizations already use AI agents and another 25% are piloting them, Forrester said, but behavioral evaluation is still mostly ad hoc rather than a standardized release gate.
  • ASSERT also puts Microsoft into a crowded AI evaluation market, where open source may reduce lock-in but does not remove concerns over bias, neutrality and vendor influence on scoring rules.

Insights

Can an open-source tool from one tech giant be trusted to impartially govern all enterprise AI systems?
As AI judges AI, could hidden biases in evaluation frameworks become the next major enterprise security risk?
With the EU AI Act's 2027 deadline looming, is this new framework the key to avoiding massive fines?

Microsoft ASSERT Unveiled: Transforming AI Agent Safety and Trust with Open-Source Evaluation Framework

Overview

At Build 2026, Microsoft unveiled ASSERT, an open-source AI behavior testing framework, as part of a major shift toward becoming an AI-native enterprise. ASSERT addresses a critical industry need for repeatable and systematic testing of increasingly sophisticated AI models, which existing evaluation methods often struggle to provide. By enabling developers to rigorously test AI agent behaviors early in development, ASSERT helps bridge the gap between rapid innovation and the demand for trustworthy, reliable AI. This move reflects a broader industry trend toward stronger AI governance and positions Microsoft at the forefront of responsible AI development.

...