Context window

The maximum amount of text an LLM can read at once, measured in tokens.

A context window is the maximum number of tokens an LLM can process in a single request. Tokens are approximately 4 characters of text; a 100,000-token window corresponds to roughly 400,000 characters. The context window limits how much information the LLM can consider at once when generating a response.

Current scale

Modern models span 8K to 2M+ tokens. Claude 4, GPT-5, and Gemini Pro offer 200K+ context windows. Larger windows are expensive (more tokens per request cost more) and slower (more tokens to process), but they unlock new use cases: processing entire books, analyzing long conversations, or handling complex multi-document reasoning.

Why context limits matter

Even with large windows, stuffing everything into a single prompt gets expensive and slow. Structured summaries — what LLMind writes into files — let agents use far less of the context window for the same information. A file enriched with semantic metadata conveys more meaning per token than raw text.

Trade-off with retrieval

Larger context windows reduce the need for RAG retrieval. But structured per-file metadata reduces it even further — agents read enriched files and get instant, precise context without any retrieval step at all.

Current scale

Why context limits matter

Trade-off with retrieval

Related terms

See also