AI agent local files: 4 access patterns compared
If you're building an AI agent that needs to reason over a directory of local files — docs, specs, customer research — you have four common access patterns. They differ on infrastructure complexity, cost, update latency, and reasoning quality. This page walks through all four and explains when each fits.
Pattern 1: Context window dump
Read every file in the directory, concatenate them into a single prompt, and feed the whole thing to the agent. The agent operates on the full corpus at once.
Infrastructure: zero. Cost: one inference call per query. Latency: immediate (no retrieval, no lookups).
Limits: the context window. If your corpus exceeds the agent's context window (Claude has 200K tokens, but that fills up quickly), the pattern breaks. Best for small, stable corpora: a team handbook (50 pages or fewer), a specification document, a project charter. Anything bigger and you're left with incomplete information.
Best for: tiny corpora (20 pages or fewer), static reference material, full read is required for every query.
Pattern 2: RAG + vector DB
The traditional AI-engineering answer. Chunk your files into small pieces (passages, paragraphs, or sentences). Embed each chunk with an embedding model (OpenAI, Cohere, Anthropic, or open-source). Store the embeddings in a vector database (Pinecone, Qdrant, Weaviate, Chroma, pgvector). At query time, embed the user's question and retrieve the top-k chunks by cosine similarity. Feed the top-k chunks into the agent's prompt.
Infrastructure: embedding model (if not outsourced to an API), vector DB, chunker, retriever. Cost: per-query API calls to embed the question, plus database overhead. Latency: embedding + DB lookup (typically milliseconds), but depends on your vector DB scale. Update latency: when files change, you re-chunk and re-embed, which can be expensive.
Limits: the embedding model must be good enough for your domain. Generic embeddings (OpenAI's text-embedding-3-small) work well for English text, but struggle with domain-specific jargon, technical specs, or languages other than English. Re-embedding on every file change is expensive if your corpus is large or actively maintained.
Best for: large corpora (500 or more documents) needing fine-grained semantic search, multi-language support, or when you need to find "all documents related to X concept" across the corpus.
Pattern 3: MCP-only (filesystem MCP)
Expose the directory through an MCP server (the Anthropic reference filesystem MCP, or a custom implementation). When the agent needs context, it calls the MCP server to read a file. No chunking, no embedding, no vector DB. The agent reads files on demand—the MCP server just handles file I/O.
Infrastructure: an MCP server (minimal code; Anthropic provides a reference). Cost: zero (no external API calls). Latency: file read from disk (milliseconds for small files; seconds for PDFs if Claude needs to parse them). Update latency: instant (files are read live from disk; no caching or indexing step).
Limits: the agent has to know which files to read. Works well if files are named descriptively or organized by folder. If your agent needs to find "all documents about X," it can't do a semantic search—it has to browse the directory and reason about filenames. For agents working with large directories (more than 100 files), this can be slow unless the naming is very clear.
Best for: small, browsable directories (100 files or fewer) with clear filenames, agent-driven exploration, cost-sensitive deployments, or when the corpus changes frequently.
Pattern 4: MCP + LLMind-enriched files
Same as Pattern 3, but every file is enriched with LLMind upfront. Before pointing the agent at the directory, run:
llmind enrich --recursive ~/docs/ Each file now carries a semantic layer—description, entities, structural summary, extracted text from images and PDFs. When the agent reads a file through the MCP server, it gets both the raw bytes and the pre-computed metadata. The agent can reason over the semantic layer (fast, structured) or parse the raw content (slower, but full detail).
Infrastructure: an MCP server (minimal). Cost: zero at inference time (enrichment is one-time upfront). Latency: file read from disk (metadata read in milliseconds; raw content available too). Update latency: when files change, re-enrich the changed files (minutes to hours depending on corpus size and file complexity).
Limits: enrichment is upfront work. If your corpus changes every hour, re-enriching is expensive. If your corpus is stable or changes daily, enrichment is a one-time win. Enrichment also requires running the LLMind CLI—works best on Unix-like machines (Linux, macOS).
Best for: stable, complex corpora (PDFs, scanned images, audio), agent reasoning without re-parsing, zero-embedding cost, fast agent iteration. Ideal for knowledge bases, research repositories, and document-heavy workflows.
Picking a pattern: decision tree
Is your corpus tiny (20 pages or fewer) and static? Pattern 1 (context dump) is simplest.
Is your corpus large (500 or more documents) and do you need semantic search across it? Pattern 2 (RAG + vector DB) is the traditional answer for large, searchable corpora.
Is your corpus small-to-medium (20–100 documents) with clear filenames? Pattern 3 (MCP-only) offers no infrastructure, zero cost, and the agent can browse by filename.
Is your corpus stable, complex (PDFs, images, audio), and read-heavy? Pattern 4 (MCP + LLMind) provides pre-computed metadata, fast agent reads, and zero embedding cost.
These patterns also overlap. Many workflows use multiple patterns: Pattern 1 for initial context (summarize the entire corpus for the agent), then Pattern 3 or 4 for detailed file reads. Or Pattern 2 for semantic search, but Pattern 4 for agent context when a specific file is identified.
Cost and latency comparison
Assume a corpus of 100 documents, average 20 pages each:
| Pattern | Infrastructure | Per-query cost | Latency | Update latency |
|---|---|---|---|---|
| Pattern 1 | None | Single inference call | Immediate | Immediate (re-read files) |
| Pattern 2 | Embedding model, vector DB | Embed call + DB lookup | Milliseconds (retrieval) | Hours (re-embed corpus) |
| Pattern 3 | MCP server | Zero | Seconds (file parse) | Immediate |
| Pattern 4 | MCP server + enrich once | Zero | Milliseconds (metadata read) | Minutes (re-enrich files) |
FAQ
How do I give an AI agent file context without a vector DB?
Pattern 3 (MCP-only) or Pattern 4 (MCP + LLMind). Expose the directory via an MCP filesystem server, and the agent reads files on demand without a vector database. With Pattern 4, enrich the files first so the agent gets pre-computed metadata without needing an embedding model or retrieval step.
When do I still need a vector database?
When you need fine-grained semantic search over a large corpus. If you have 1,000 documents and you want to find "all documents about supply chain disruption" through semantic similarity across document fragments, a vector DB is the right tool. If your corpus is small (50 or fewer documents) or searchable by filename/tags, Patterns 3 and 4 are simpler and faster.