File enrichment glossary

Definitions for the terms that matter when you work with LLM-ready files — file enrichment, XMP, C2PA, MCP, tamper-evident metadata, and the rest.

LLM-ready file — A file whose metadata carries a structured, AI-readable semantic layer so language models can consume it without re-parsing.
File enrichment engine — Software that writes structured, signed semantic metadata into the file's own XMP packet.
Self-describing file — A file whose embedded metadata includes the information needed for AI tools to understand its content without external lookups.
Semantic layer for files — A structured, AI-readable metadata stratum inside the file itself — analogous to a BI semantic layer, applied to files.
LRFS — The LLM-Ready File Specification — LLMind's open spec for in-file semantic metadata and signing.
C2PA — Coalition for Content Provenance and Authenticity — a standard for cryptographically-signed origin metadata in image and video files.
Content Credentials — Adobe's open-source implementation of the C2PA standard for signed provenance metadata.
Tamper-evident metadata — File metadata that can be cryptographically verified as unmodified since signing.
Signed semantic metadata — Metadata describing a file's meaning, cryptographically signed so consumers can verify integrity.
Provenance — The verifiable origin and modification history of a file or its content.
HMAC-SHA256 — A keyed message-authentication code built on SHA-256 — LLMind's default signing primitive for LRFS layers.
File checksum — A fixed-length digest (typically SHA-256) that uniquely identifies a file's byte contents.
XMP — Extensible Metadata Platform — Adobe's standard for embedded, structured file metadata in XML-like RDF.
EXIF — Exchangeable Image File Format — the oldest widely-used standard for image metadata, typically for camera-origin data.
IPTC — International Press Telecommunications Council — metadata standard for news photography and editorial imagery.
Dublin Core — A minimal 15-element metadata vocabulary for describing any resource — often reused as XMP schema.
XMP namespace — A URI that identifies the vocabulary of XMP properties a file uses — LLMind's is https://llmind.org/ns/1.0/.
Sidecar file — A separate file (often .xmp) that stores metadata for a primary file, rather than embedding metadata inside it.
Embedded metadata — Metadata stored inside the file itself — e.g., in the XMP packet — rather than in a separate sidecar or database.
MCP — Model Context Protocol — Anthropic's open standard for connecting AI agents to tools, files, and data sources.
RAG — Retrieval-Augmented Generation — a pattern where an LLM retrieves relevant chunks before generating a response.
Vector database — A database optimized for nearest-neighbor search over high-dimensional embedding vectors — common in RAG stacks.
Chunking — Splitting a document into smaller passages for embedding and retrieval in a RAG pipeline.
Embedding — A dense vector representation of text (or other data) where semantic similarity corresponds to vector proximity.
Context window — The maximum amount of text an LLM can read at once, measured in tokens.
AI agent — An LLM-powered program that can plan, call tools, and act in a loop to accomplish a goal.
AI Overview — Google's generative answer feature that synthesizes responses from multiple sources and cites them.
OCR — Optical Character Recognition — extracting text from an image or scanned page.
IDP — Intelligent Document Processing — AI-assisted extraction of structured data from documents, including OCR plus understanding.
DAM — Digital Asset Management — software that stores, organizes, and distributes media files across an organization.

Explore more