LLMind concepts
LLMind concepts in one page. The semantic layer, the three layer types, the signing scheme, and why putting structured metadata inside the file beats a vector database for AI retrieval.
What LLMind embeds
LLMind writes a structured, signed semantic layer into a file's XMP
packet. The layer is defined by the
LRFS specification
(LLM-Ready File Specification) and bound to a stable XMP namespace at
https://llmind.org/ns/1.0/. Any tool that can read XMP — and
that's almost any image, document, or video pipeline — can find the
layer, parse it, and use the data without re-running OCR, parsing, or
embedding.
The three layers
A single LRFS payload carries three independent layer types. Each layer answers a different question, and each can be present or absent independently.
1. Descriptive
Title, summary, language, and high-level entities. Ten lines of JSON-equivalent text answering "what is this file about?". Useful for retrieval ranking, search index population, and agent-side filtering before a deeper read.
2. Structural
Section headings, page-level outlines, table-of-contents anchors, extracted text by region, and OCR transcriptions. The expensive part of parsing a PDF or image, computed once and cached inside the file. See OCR once, read forever for the workflow argument.
3. Provenance
A signed audit trail: who enriched the file, when, with what version of LLMind and which model. Plus a SHA-256 file checksum bound into the signature so any byte-level tampering is detectable. See the signed semantic metadata glossary entry and the signing scheme spec chapter for the cryptographic detail.
Why files, not vector databases
A vector database is a separate piece of infrastructure that holds embeddings keyed to your files. It needs to be deployed, kept in sync as files change, and queried at retrieval time. The semantic-layer-in-file pattern moves the data the agent needs from a sidecar service into the file itself.
Files are portable: copying a PDF to a new machine carries every layer that LLMind embedded. Files are tool-agnostic: any pipeline that reads XMP gets the metadata; no SDK lock-in. Files are signed: the layer is tamper-evident without an external signing service. And the only infrastructure cost is the one-time enrichment run — no synced index, no retrieval fleet.
The signing scheme in one paragraph
LRFS canonicalizes the payload's RDF/XML representation into a deterministic byte sequence, then computes HMAC-SHA256 over those bytes plus the file's SHA-256 checksum. The HMAC key is held by the enricher; verifiers either share the symmetric key (private verification) or rely on an ed25519 alternative for public verification. Modifying any byte of the payload — or the underlying file — invalidates the signature. See /spec/signing-scheme/ for the algorithm and key-handling rules.
What LLMind is NOT
LLMind is the engine that writes the layer. It is intentionally not:
- A parser, OCR engine, or IDP tool — it consumes the output of those tools and embeds it.
- A RAG framework — there is no retrieval orchestrator inside LLMind.
- A vector database — no embeddings, no nearest-neighbor index.
- An enterprise search system.
It is one focused thing: a way to put a signed, structured semantic layer inside any file. Everything else is composable on top.
Where to go next
- Quickstart — install and enrich your first file in 5 minutes.
- Recipes — copy-paste workflows for common tasks.
- LLMind CLI — the surface for developers and dataset pipelines.
- MCP integrations — connect Claude Desktop, Cursor, and more to enriched files.
- Payload format spec chapter — the canonical RDF structure.
- Signing scheme spec chapter — full cryptographic detail.
- LLM-ready files — the broader concept.