Updated
Updated · InfoWorld · Jun 16
IBM, Nvidia and Red Hat Launch DocLang for AI Documents as LLMs Struggle With PDFs
Updated
Updated · InfoWorld · Jun 16

IBM, Nvidia and Red Hat Launch DocLang for AI Documents as LLMs Struggle With PDFs

3 articles · Updated · InfoWorld · Jun 16

Summary

  • DocLang debuted as a Linux Foundation-hosted working group to create an open, vendor-neutral format for business documents built for LLM tokenizers rather than human readers.
  • PDFs, JPEGs and other legacy formats raise cost and reduce reliability when enterprises extract meaning for generative AI and agentic systems, the group said.
  • The specification is designed as a structured, machine-readable layer for any document type, building on the DocLing toolkit that converts PDFs, word-processing files and spreadsheets into structured data.
  • ABBYY and Human Signal joined the effort, while analysts said an AI-native standard could improve efficiency and lower risk if preprocessing stays invisible to users and preserves human-readable documents.
  • Governance may become the next hurdle: Info-Tech said organizations adopting DocLang will need controls and reviews to scale its use securely and accountably.

Insights

Is DocLang truly future-proof, or will the next generation of AI make it obsolete?
Can a document format built for AI still be truly usable for the average person?

DocLang Launches: The Missing Substrate for Reliable, Compliant AI Document Processing

Overview

DocLang, launched on June 12, 2026, is an open, AI-native document standard designed to bridge the gap between traditional document formats and the needs of modern AI systems. As the landscape of artificial intelligence evolves, DocLang provides a foundational layer for document intelligence and agentic AI workflows. It works in tandem with Docling, a system for processing and structuring documents, enabling more efficient and reliable AI interaction with information. By offering an interoperable and machine-readable format, DocLang helps AI systems better understand and use document content, paving the way for smarter automation and analysis.

...