---
title: "RAG — Glossary | LLMind"
description: "Retrieval-Augmented Generation — a pattern where an LLM retrieves relevant chunks before generating a response."
url: https://llmind.org/glossary/rag/
source_format: html
---
[← Glossary](https://llmind.org/glossary/)

# RAG

**Retrieval-Augmented Generation — a pattern where an LLM retrieves relevant chunks before generating a response.**

Retrieval-Augmented Generation is a standard architecture for giving LLMs access to a large corpus. The pattern is straightforward: chunk a document corpus, embed each chunk as a dense vector, store these embeddings in a vector database, retrieve the top-k most relevant chunks at query time, then feed the retrieved chunks into the LLM's prompt alongside the user's question. The LLM synthesizes a response grounded in the retrieved content.

## The pattern

RAG requires four steps: chunking (break corpus into coherent passages), embedding (convert each passage to a vector using an embedding model), storage (index vectors in a vector DB for efficient similarity search), and retrieval (find top-k chunks for the query). This design allows LLMs to reason over data larger than their context window, while keeping inference costs reasonable.

## Why it's everywhere

RAG is the default answer to "give an LLM access to a large corpus." It works well for millions of passages and scales gracefully. Vector DBs are mature, embedding models are fast and cheap, and the pattern is language-agnostic. Nearly every LLM-powered search and chatbot application uses RAG in some form.

## Where LLMind sits

LLMind is not a RAG framework. Instead, it caches signed semantic metadata inside files so the retrieval step has richer content to work with — or so an agent can skip retrieval entirely by reading enriched files through an MCP server. Rather than chunking for similarity search, LLMind enriches files with structured summaries the LLM reads without needing a vector DB.

## Related terms

-   [Vector database](https://llmind.org/glossary/vector-database/)
-   [Chunking](https://llmind.org/glossary/chunking/)
-   [Embedding](https://llmind.org/glossary/embedding/)

## See also

-   [Agent file access patterns](https://llmind.org/learn/agent-file-access-patterns/)
-   [Enrichment vs chunking](https://llmind.org/learn/enrichment-vs-chunking/)
-   [Spec](https://llmind.org/spec/)