IBM, Nvidia and Red Hat Launch DocLang for AI Documents as LLMs Struggle With PDFs
Updated
Updated · InfoWorld · Jun 16
IBM, Nvidia and Red Hat Launch DocLang for AI Documents as LLMs Struggle With PDFs
3 articles · Updated · InfoWorld · Jun 16
Summary
DocLang debuted as a Linux Foundation-hosted working group to create an open, vendor-neutral format for business documents built for LLM tokenizers rather than human readers.
PDFs, JPEGs and other legacy formats raise cost and reduce reliability when enterprises extract meaning for generative AI and agentic systems, the group said.
The specification is designed as a structured, machine-readable layer for any document type, building on the DocLing toolkit that converts PDFs, word-processing files and spreadsheets into structured data.
ABBYY and Human Signal joined the effort, while analysts said an AI-native standard could improve efficiency and lower risk if preprocessing stays invisible to users and preserves human-readable documents.
Governance may become the next hurdle: Info-Tech said organizations adopting DocLang will need controls and reviews to scale its use securely and accountably.
Is DocLang truly future-proof, or will the next generation of AI make it obsolete?
Can a document format built for AI still be truly usable for the average person?
DocLang Launches: The Missing Substrate for Reliable, Compliant AI Document Processing
Overview
DocLang, launched on June 12, 2026, is an open, AI-native document standard designed to bridge the gap between traditional document formats and the needs of modern AI systems. As the landscape of artificial intelligence evolves, DocLang provides a foundational layer for document intelligence and agentic AI workflows. It works in tandem with Docling, a system for processing and structuring documents, enabling more efficient and reliable AI interaction with information. By offering an interoperable and machine-readable format, DocLang helps AI systems better understand and use document content, paving the way for smarter automation and analysis.