File enrichment glossary
Definitions for the terms that matter when you work with LLM-ready files — file enrichment, XMP, C2PA, MCP, tamper-evident metadata, and the rest.
- LLM-ready file — A file whose metadata carries a structured, AI-readable semantic layer so language models can consume it without re-parsing.
- File enrichment engine — Software that writes structured, signed semantic metadata into the file's own XMP packet.
- Self-describing file — A file whose embedded metadata includes the information needed for AI tools to understand its content without external lookups.
- Semantic layer for files — A structured, AI-readable metadata stratum inside the file itself — analogous to a BI semantic layer, applied to files.
- LRFS — The LLM-Ready File Specification — LLMind's open spec for in-file semantic metadata and signing.
- C2PA — Coalition for Content Provenance and Authenticity — a standard for cryptographically-signed origin metadata in image and video files.
- Content Credentials — Adobe's open-source implementation of the C2PA standard for signed provenance metadata.
- Tamper-evident metadata — File metadata that can be cryptographically verified as unmodified since signing.
- Signed semantic metadata — Metadata describing a file's meaning, cryptographically signed so consumers can verify integrity.
- Provenance — The verifiable origin and modification history of a file or its content.
- HMAC-SHA256 — A keyed message-authentication code built on SHA-256 — LLMind's default signing primitive for LRFS layers.
- File checksum — A fixed-length digest (typically SHA-256) that uniquely identifies a file's byte contents.
- XMP — Extensible Metadata Platform — Adobe's standard for embedded, structured file metadata in XML-like RDF.
- EXIF — Exchangeable Image File Format — the oldest widely-used standard for image metadata, typically for camera-origin data.
- IPTC — International Press Telecommunications Council — metadata standard for news photography and editorial imagery.
- Dublin Core — A minimal 15-element metadata vocabulary for describing any resource — often reused as XMP schema.
- XMP namespace — A URI that identifies the vocabulary of XMP properties a file uses — LLMind's is https://llmind.org/ns/1.0/.
- Sidecar file — A separate file (often .xmp) that stores metadata for a primary file, rather than embedding metadata inside it.
- Embedded metadata — Metadata stored inside the file itself — e.g., in the XMP packet — rather than in a separate sidecar or database.
- MCP — Model Context Protocol — Anthropic's open standard for connecting AI agents to tools, files, and data sources.
- RAG — Retrieval-Augmented Generation — a pattern where an LLM retrieves relevant chunks before generating a response.
- Vector database — A database optimized for nearest-neighbor search over high-dimensional embedding vectors — common in RAG stacks.
- Chunking — Splitting a document into smaller passages for embedding and retrieval in a RAG pipeline.
- Embedding — A dense vector representation of text (or other data) where semantic similarity corresponds to vector proximity.
- Context window — The maximum amount of text an LLM can read at once, measured in tokens.
- AI agent — An LLM-powered program that can plan, call tools, and act in a loop to accomplish a goal.
- AI Overview — Google's generative answer feature that synthesizes responses from multiple sources and cites them.
- OCR — Optical Character Recognition — extracting text from an image or scanned page.
- IDP — Intelligent Document Processing — AI-assisted extraction of structured data from documents, including OCR plus understanding.
- DAM — Digital Asset Management — software that stores, organizes, and distributes media files across an organization.