Enrichment vs. chunking
Chunking is a RAG pipeline stage. File enrichment is a file property. The two work at different layers of the stack, and they compose well together.
The question comes up almost every time someone new hears about file enrichment: does it replace chunking? No. They solve different problems at different layers, and a well-built system usually does both.
What chunking is, and why it exists
Chunking is a RAG pipeline stage. When you want retrieval-augmented generation to find relevant content inside a large document, you split the document into smaller pieces — chunks — and embed each piece as a vector. At query time, you embed the user's question, find the nearest chunk vectors, and stuff the matching chunks into the LLM's context window.
Every RAG framework has its own chunking strategy: fixed token counts, paragraph boundaries, semantic similarity, hierarchical trees. None is perfect. Chunking is an intrinsically lossy operation: you turn a document into a bag of fragments, and some meaning is lost at every boundary.
What enrichment is
Enrichment is a file-level operation. You run it once per file; the result — extracted text, document structure, summary, signed checksum — lives inside the file's own metadata. Any downstream tool reading the file gets the result for free.
See What is file enrichment? for the full explanation.
Different layers, different jobs
The cleanest mental model:
- Enrichment runs at the file / content layer. It answers “what does this file contain?” — once.
- Chunking runs at the RAG pipeline layer. It answers “how should I split this content for retrieval?” — per use case.
Enrichment solves the parsing and understanding problem once, at the source. Chunking solves the retrieval problem at query time. They don't compete.
Enrichment makes chunking easier, not obsolete
When a RAG pipeline encounters a LLMind-enriched PDF, it skips re-parsing. The
llmind:text field gives clean extracted text; llmind:structure
gives the document's heading and table layout as JSON. The pipeline can still chunk
that content for its vector index — and it can chunk it more intelligently because it
has the document structure up front.
Put differently: enrichment removes the OCR and parsing stages from your RAG ingest. It does not remove chunking, embedding, or retrieval.
When to use each (and when to use both)
Use chunking when
- You're building a RAG pipeline that needs fast semantic retrieval
- You have a large corpus and need relevance ranking across many documents
- Queries don't map to entire documents — users want passages, not files
Use enrichment when
- Files need to be read by multiple AI tools (Claude, ChatGPT, Cursor, NotebookLM, MCP servers)
- Large PDFs, scanned documents, or audio transcripts re-parse slowly on every load
- You need tamper-evident provenance on document content and semantic metadata
- You want files to be natively readable by AI tools without a retrieval layer at all
Use both when
- Your RAG pipeline ingests from multiple upstream sources and needs consistent, cached parsing
- You want to reduce pipeline cost and latency by removing parse-and-OCR from ingest
- Downstream tools outside the RAG pipeline also need to read the files
The practical difference
A RAG system without enrichment re-parses and re-OCRs every file on every ingest. A RAG system with enrichment reads the cached layer and skips straight to chunking and embedding. Same retrieval quality; lower cost, faster ingest, reproducible results.
Try it
pipx install 'llmind-cli[all]'
llmind enrich myfile.pdf