XMP metadata Python: libraries, patterns, and gotchas

Published 2026-04-22 · 8 min read

Python has no single "official" XMP library. Depending on the file type, you'll reach for py3exiv2 (images), pikepdf (PDFs), pillow (images, limited), or piexif (EXIF, minimal XMP). This guide covers the common patterns, the gotchas, and how LLMind sits on top of this plumbing.

The Python XMP landscape

If you're working with XMP metadata in Python, you need to pick a library before you write a single line. The ecosystem is fragmented by file type and binding style:

None of these libraries provide a high-level "write structured semantic data" API. They're low-level primitives. You get a dictionary of XMP tags; you manage the RDF structure and namespace details yourself. That's where LLMind enters the story.

Read XMP from a JPEG

Here's the pattern for reading XMP from an image file using py3exiv2:

import py3exiv2

with py3exiv2.Image('photo.jpg') as img:
    xmp_dict = img.read_xmp()
    # xmp_dict contains all available XMP properties
    for key, value in xmp_dict.items():
        print(key, value)

    # Access a specific property
    if 'Xmp.dc.title' in xmp_dict:
        print("Title found", xmp_dict['Xmp.dc.title'])

The keys follow the XMP namespace prefix convention (e.g., Xmp.prefix.localName). dc is Dublin Core (a standard, pre-registered namespace). The values are usually strings, but complex properties (like arrays) come back as lists.

One gotcha: py3exiv2 requires exiv2 installed as a system dependency. On macOS, brew install exiv2. On Linux, apt-get install libexiv2-dev. Windows is more complex; consider pikepdf if you're Windows-only and don't need images.

Write XMP to a PDF

For PDFs, pikepdf's metadata API is cleaner than exiv2:

from pikepdf import Pdf

with Pdf.open('document.pdf', allow_overwriting_input=True) as pdf:
    with pdf.open_metadata() as meta:
        # Standard properties (Dublin Core)
        meta['dc:title'] = 'My Document Title'
        meta['dc:creator'] = ['Alice', 'Bob']  # Arrays work
        meta['dc:description'] = 'A summary of this PDF.'

        # Custom namespace: declare it first
        meta.register_namespace('NAMESPACE_URI', 'myns')
        meta['myns:customField'] = 'custom value'
        meta['myns:version'] = '2.1'

    pdf.save()
    print('PDF metadata written successfully')

The open_metadata() context manager handles the RDF/XML serialization. You just set key-value pairs. The critical gotcha: if you use a custom namespace (one that's not pre-registered in pikepdf's defaults), you must call register_namespace() before assigning to it. Otherwise, the write silently fails or uses a wrong prefix.

Note allow_overwriting_input=True—this modifies the PDF in place. Omit it to create a new file.

The "design your own schema" problem

Once you can read and write XMP, you face a design problem: all these libraries let you write arbitrary tags. You still have to decide:

This is the friction point. You're not just writing metadata; you're inventing a schema and hoping every consumer understands it.

How LLMind sits on top

LLMind solves the schema-design problem by publishing LRFS (LLM-Ready File Specification), a fully-specified schema under a stable namespace. Instead of hand-rolling XMP tags, you use LLMind:

from llmind import enrich

# One-liner. Writes structured, signed semantic metadata
# under the stable llmind.org namespace, with HMAC signature.
enrich('research_paper.pdf')

Internally, LLMind:

Every downstream tool (Claude agent, LangChain, custom RAG pipeline) reads the same layer, with the same namespace semantics, signed with the same key. No schema design work. No round-trip surprises.

FAQ

Which Python XMP library should I use?

It depends on your file type and binding preference. For images (JPEG, PNG, TIFF), py3exiv2 or pyexiv2 are mature and feature-complete but require a C++ binary dependency (exiv2). For PDFs, pikepdf is the de facto standard and wraps QPDF. For lightweight image metadata, pillow has built-in support but limited XMP handling. For EXIF-focused workflows with minimal XMP, piexif is lightweight and pure Python. If you're starting a new project, pikepdf (PDF) or py3exiv2 (images) are the safest bets.

Can I use LLMind alongside my existing XMP code?

Yes. LLMind writes to the https://llmind.org/ns/1.0/ namespace, which is independent of other namespaces (Dublin Core, IPTC, etc.). Your existing XMP code can coexist with LLMind's enrichment layer. The HMAC signature protects only the LLMind namespace, so you can add or modify other metadata without affecting the enrichment layer's integrity.

Explore more