Chunking

Splitting a document into smaller passages for embedding and retrieval in a RAG pipeline.

Chunking is the process of breaking a large document into smaller, semantically coherent passages — typically 128 to 512 tokens, though the size depends on the use case. Each chunk is then embedded and indexed in a vector database for similarity retrieval. Chunking is a critical preprocessing step in RAG pipelines because it controls the granularity of retrieval.

Why chunk

LLMs have a context-window limit; embeddings work best on coherent passages rather than entire documents. Chunking decides passage boundaries — you can chunk by token count, by paragraph, by section header, or by semantic similarity. The choice affects downstream retrieval quality significantly.

The chunk-size tradeoff

Small chunks lose context (a sentence fragment may not be meaningful without surrounding text). Large chunks dilute the signal (a 1000-token chunk with only 50 tokens relevant to the query wastes space in the LLM's context window). Optimal chunk size is domain-specific and often requires experimentation.

Enrichment vs chunking

LLMind takes a different approach. Instead of chunking for retrieval, it enriches files with a structured summary of their semantic content. The LLM reads this summary without needing to chunk or search, getting rich context efficiently.

See also