Mistral Releases OCR 4 With 85.20 Benchmark Score and 170-Language Support
Updated
Updated · Mistral AI · Jun 23
Mistral Releases OCR 4 With 85.20 Benchmark Score and 170-Language Support
1 articles · Updated · Mistral AI · Jun 23
Summary
Mistral unveiled OCR 4 as a document-intelligence model that adds bounding boxes, block classification and inline confidence scores, extending OCR from text extraction to structured document understanding.
85.20 on OlmOCRBench and a 72% average human-preference win rate underpin Mistral’s performance claims, though the company said public benchmark scores can be distorted by annotation and formatting artifacts.
170 languages across 10 language groups are supported, with Mistral highlighting stronger accuracy on rare and low-resource languages and a compact design that can run in a single self-hosted container.
$4 per 1,000 pages via API—falling to $2 with batch processing—positions OCR 4 for enterprise search, RAG and workflow automation, while Document AI layers schema-based JSON extraction on the same endpoint for $5 per 1,000 pages.
OCR 4 is also being tied into Mistral’s Search Toolkit and distributed through Mistral Studio, Amazon SageMaker and Microsoft Foundry, signaling a broader push into enterprise document-processing infrastructure.
Mistral AI has entered the OCR field with the release of Mistral OCR 4, a state-of-the-art solution designed to transform scanned documents and images into structured, machine-readable data. Unlike traditional OCR, it goes beyond simple text extraction by preserving original formatting such as headings, tables, and images, and includes advanced features like bounding boxes and confidence scores. Mistral OCR 4 is immediately available on La Plateforme, with upcoming access through cloud partners, and offers a self-hosted, on-premise option for organizations with sensitive data. This makes it a versatile and powerful tool for modern document intelligence needs.