LLMind concepts

Published 2026-05-02

LLMind concepts in one page. The semantic layer, the three layer types, the signing scheme, and why putting structured metadata inside the file beats a vector database for AI retrieval.

What LLMind embeds

LLMind writes a structured, signed semantic layer into a file's XMP packet. The layer is defined by the LRFS specification (LLM-Ready File Specification) and bound to a stable XMP namespace at https://llmind.org/ns/1.0/. Any tool that can read XMP — and that's almost any image, document, or video pipeline — can find the layer, parse it, and use the data without re-running OCR, parsing, or embedding.

The three layers

A single LRFS payload carries three independent layer types. Each layer answers a different question, and each can be present or absent independently.

1. Descriptive

Title, summary, language, and high-level entities. Ten lines of JSON-equivalent text answering "what is this file about?". Useful for retrieval ranking, search index population, and agent-side filtering before a deeper read.

2. Structural

Section headings, page-level outlines, table-of-contents anchors, extracted text by region, and OCR transcriptions. The expensive part of parsing a PDF or image, computed once and cached inside the file. See OCR once, read forever for the workflow argument.

3. Provenance

A signed audit trail: who enriched the file, when, with what version of LLMind and which model. Plus a SHA-256 file checksum bound into the signature so any byte-level tampering is detectable. See the signed semantic metadata glossary entry and the signing scheme spec chapter for the cryptographic detail.

Why files, not vector databases

A vector database is a separate piece of infrastructure that holds embeddings keyed to your files. It needs to be deployed, kept in sync as files change, and queried at retrieval time. The semantic-layer-in-file pattern moves the data the agent needs from a sidecar service into the file itself.

Files are portable: copying a PDF to a new machine carries every layer that LLMind embedded. Files are tool-agnostic: any pipeline that reads XMP gets the metadata; no SDK lock-in. Files are signed: the layer is tamper-evident without an external signing service. And the only infrastructure cost is the one-time enrichment run — no synced index, no retrieval fleet.

The signing scheme in one paragraph

LRFS canonicalizes the payload's RDF/XML representation into a deterministic byte sequence, then computes HMAC-SHA256 over those bytes plus the file's SHA-256 checksum. The HMAC key is held by the enricher; verifiers either share the symmetric key (private verification) or rely on an ed25519 alternative for public verification. Modifying any byte of the payload — or the underlying file — invalidates the signature. See /spec/signing-scheme/ for the algorithm and key-handling rules.

What LLMind is NOT

LLMind is the engine that writes the layer. It is intentionally not:

A parser, OCR engine, or IDP tool — it consumes the output of those tools and embeds it.
A RAG framework — there is no retrieval orchestrator inside LLMind.
A vector database — no embeddings, no nearest-neighbor index.
An enterprise search system.

It is one focused thing: a way to put a signed, structured semantic layer inside any file. Everything else is composable on top.

Where to go next

Quickstart — install and enrich your first file in 5 minutes.
Recipes — copy-paste workflows for common tasks.
LLMind CLI — the surface for developers and dataset pipelines.
MCP integrations — connect Claude Desktop, Cursor, and more to enriched files.
Payload format spec chapter — the canonical RDF structure.
Signing scheme spec chapter — full cryptographic detail.
LLM-ready files — the broader concept.