---
title: "Chunking — Glossary | LLMind"
description: "Splitting a document into smaller passages for embedding and retrieval in a RAG pipeline."
url: https://llmind.org/glossary/chunking/
source_format: html
---
[← Glossary](https://llmind.org/glossary/)

# Chunking

**Splitting a document into smaller passages for embedding and retrieval in a RAG pipeline.**

Chunking is the process of breaking a large document into smaller, semantically coherent passages — typically 128 to 512 tokens, though the size depends on the use case. Each chunk is then embedded and indexed in a vector database for similarity retrieval. Chunking is a critical preprocessing step in RAG pipelines because it controls the granularity of retrieval.

## Why chunk

LLMs have a context-window limit; embeddings work best on coherent passages rather than entire documents. Chunking decides passage boundaries — you can chunk by token count, by paragraph, by section header, or by semantic similarity. The choice affects downstream retrieval quality significantly.

## The chunk-size tradeoff

Small chunks lose context (a sentence fragment may not be meaningful without surrounding text). Large chunks dilute the signal (a 1000-token chunk with only 50 tokens relevant to the query wastes space in the LLM's context window). Optimal chunk size is domain-specific and often requires experimentation.

## Enrichment vs chunking

LLMind takes a different approach. Instead of chunking for retrieval, it enriches files with a structured summary of their semantic content. The LLM reads this summary without needing to chunk or search, getting rich context efficiently.

## Related terms

-   [RAG](https://llmind.org/glossary/rag/)
-   [Embedding](https://llmind.org/glossary/embedding/)
-   [Vector database](https://llmind.org/glossary/vector-database/)

## See also

-   [Enrichment vs chunking](https://llmind.org/learn/enrichment-vs-chunking/)
-   [Spec](https://llmind.org/spec/)