Baidu releases Unlimited OCR model for long-horizon document parsing

Baidu has released a new optical character recognition model called Unlimited OCR, positioning it as a step beyond recent OCR systems focused on document understanding and parsing. The company posted the model on Hugging Face, along with code on GitHub and a paper on arXiv, signaling a broader push to make the system easier for developers to evaluate and deploy.

The model is described as supporting what Baidu calls one-shot long-horizon parsing. In practical terms, that means it is aimed at extracting and organizing text from complex inputs such as single document images, batches of page images, and PDFs. Baidu says the release is intended to push the capabilities of DeepSeek-OCR further, though the announcement does not provide comparative benchmarks in the source material.

According to the published instructions, Unlimited OCR can be run through Hugging Face Transformers on NVIDIA GPUs. Baidu lists a specific set of tested software dependencies, including recent versions of PyTorch, Transformers, Pillow, Matplotlib, Einops, and PyMuPDF. The model is loaded through standard AutoModel and AutoTokenizer calls with remote code enabled, suggesting that users will rely on model-specific implementation details rather than a fully generic OCR interface.

Two modes for different document types

Baidu says the model supports two configurations for single-image use. One mode, referred to as gundam, uses a smaller image size and cropping behavior. The other, called base, works at the full image size and does not use cropping. For multi-page inputs and PDFs, the company directs users to use the base configuration. Baidu also sets a long maximum output length of 32,768 tokens for inference, indicating that the model is designed to handle substantial amounts of extracted content.

For PDFs, the release materials recommend converting pages into images before passing them into the model. Baidu provides example code that uses PyMuPDF to render each page at 300 dpi, then feeds the resulting images into a multi-page parsing function. The examples also include settings meant to reduce repeated output, such as custom n-gram controls.

In addition to local inference, Baidu says Unlimited OCR can be served with SGLang, an inference framework that exposes an OpenAI-compatible API. The instructions show how to start a server with the model, set a 32,768-token context length, and stream responses from client applications. Baidu’s examples again distinguish between single-image and multi-image workflows, with base mode used for multi-page and PDF parsing.

The company also included a short visualization demo showing long-horizon OCR in action. Beyond the code and deployment guidance, the release credits DeepSeek-OCR, DeepSeek-OCR-2, and PaddleOCR as influences or supporting references.

Baidu’s citation entry lists the work as a 2026 arXiv preprint titled "Unlimited OCR Works." The model page shows early community interest as well, with more than 8,000 downloads last month listed on Hugging Face. While the release materials focus heavily on implementation details, they suggest Baidu is targeting developers who need OCR systems that can handle longer documents and more elaborate parsing tasks than conventional page-by-page text extraction.