---
title: "OCR — Glossary | LLMind"
description: "Optical Character Recognition — extracting text from an image or scanned page."
url: https://llmind.org/glossary/ocr/
source_format: html
---
[← Glossary](https://llmind.org/glossary/)

# OCR

**Optical Character Recognition — extracting text from an image or scanned page.**

Optical Character Recognition (OCR) is the process of converting pixel-level text in images, photographs, or scanned documents into machine-readable text. Classical OCR engines like Tesseract use statistical pattern matching; modern vision-model OCR (Mistral Vision, GPT-4V, AWS Textract) uses deep learning to achieve higher accuracy and extract richer structure (tables, form fields, layout information).

## What it handles

OCR extracts text from scanned PDFs, smartphone photos of documents, screenshots, and any image containing readable text. Modern OCR tools go beyond text extraction: they detect tables, identify form fields, extract equations, and preserve spatial relationships. Accuracy depends heavily on image quality, font size, and language.

## The cost problem

OCR is expensive per call. Running it repeatedly on the same file wastes compute and dollars. If you process a document multiple times (indexing, searching, analyzing), you're paying for OCR again each time even though the text hasn't changed.

## LLMind's role

LLMind caches OCR output inside the file's XMP packet as signed semantic metadata. Run OCR once; downstream AI pipelines read the cached OCR text instead of re-calling the OCR provider. The cache is portable, verifiable, and saves money by eliminating redundant OCR calls.

## Related terms

-   [IDP](https://llmind.org/glossary/idp/)
-   [File enrichment engine](https://llmind.org/glossary/file-enrichment-engine/)

## See also

-   [OCR once read forever](https://llmind.org/learn/ocr-once-read-forever/)
-   [OCR cache use case](https://llmind.org/use-cases/ocr-cache/)
-   [Spec](https://llmind.org/spec/)
