File enrichment engine

Software that writes structured, signed semantic metadata into the file's own XMP packet.

A file enrichment engine is a backend service that takes files — images, PDFs, audio, video — and writes semantic metadata directly into them. It runs after parsers and analysis tools have extracted meaning from the content. Instead of storing results in a database, the engine persists them inside the file itself as structured, signed XMP metadata. The result is an LLM-ready file that carries its metadata everywhere.

What it does

The engine accepts input from multiple sources: OCR engines extract text, computer vision models identify objects and scenes, language models generate summaries. The engine normalizes this output into a standardized RDF schema, canonicalizes it, signs it, and writes it to the file's XMP packet. It handles the details: namespace declaration, layer-specific HMAC-SHA256 or ed25519 signatures, XMP compliance rules, and format-specific write paths for JPEG, PNG, PDF, TIFF, and audio formats.

Why 'engine'

The term reflects its design: not a one-off script but a reusable, composable layer that plugs between your analysis pipeline and your storage. It abstracts away XMP complexity, versioning, signing schemes, and format specifics. A single engine instance can enrich thousands of files per day across different file types, format versions, and enrichment sources.

See also