LF AI & Data Launches DocLang Group to Build AI-Native Document Standard With 5 Founding Contributors
Updated
Updated · Computerworld · Jun 10
LF AI & Data Launches DocLang Group to Build AI-Native Document Standard With 5 Founding Contributors
2 articles · Updated · Computerworld · Jun 10
Summary
LF AI & Data Foundation on Tuesday formed a working group to create DocLang, an open specification for machine-readable documents across AI and agentic workflows.
IBM, Nvidia and Red Hat founded the group, with ABBYY and Human Signal contributing, as the foundation argues PDFs, JPEGs and other human-first formats raise cost and reduce reliability for AI extraction.
DocLang is pitched as a vendor-neutral standard rather than a converter or API—akin to JSON for data or HTML for the web—so enterprises can prepare, exchange and govern document data at scale.
Analysts said the effort could improve preprocessing and token efficiency, but warned adoption will still require governance, compliance controls, metadata management and broader organizational readiness.
Could advanced AI make this new document format obsolete before it's even adopted?
How will DocLang's embedded governance prevent data misuse by non-compliant AI systems?
Will a document format for AI fundamentally change how humans create information?
DocLang Unveiled: The Open, AI-Ready Document Standard Set to Replace PDF and Slash Enterprise AI Costs
Overview
On June 9, 2026, the LF AI & Data Foundation announced DocLang, a new AI-native document format designed to become the international standard for unstructured content. DocLang addresses the growing gap between traditional, human-centric formats like PDF and the needs of modern AI systems, which require structured, semantic metadata for efficient processing. Built from extensive research, DocLang enables documents to be easily understood by AI, reducing the need for complex extraction and improving workflow efficiency. Its open, vendor-neutral governance aims to foster broad industry adoption and drive the next era of AI-powered document intelligence.