---
title: "LRFS payload format — XMP canonicalization and layer schema | LLMind"
description: "The LRFS payload format chapter. XMP packet structure, RDF layer schema, canonicalization rules, and backward-compatibility for in-file LLM-ready metadata."
url: https://llmind.org/spec/payload-format/
source_format: html
---
# LRFS payload format

LRFS v1.0 · Published 2026-04-22

This chapter specifies how an LLM-Ready File Specification (LRFS) payload is structured, serialized, and canonicalized. It is normative — implementers conforming to LRFS v1 MUST follow the rules in this chapter.

## 1\. Host packet

The LRFS payload is carried inside an XMP packet per ISO 16684-1. The packet must be placed in a location specific to each host file format:

-   **JPEG**: APP1 segment
-   **PNG**: iTXt chunk with keyword `XML:com.adobe.xmp`
-   **PDF**: Metadata stream (XMP per PDF 1.4+)
-   **MP3**: ID3v2 PRIV frame carrying XMP
-   **WAV**: RIFF INFO chunk or xmp subchunk
-   **M4A**: XMP box within the file container

Implementers MUST reference the XMP specification (ISO 16684-1) and the format-specific appendices for precise byte-level placement rules. This chapter does not duplicate XMP packet serialization — it assumes the packet is correctly embedded per the standard.

## 2\. Namespace binding

The LRFS namespace URI is `https://llmind.org/ns/1.0/`. Implementers MUST bind the prefix `llmind` to this URI in the XMP RDF document. Example:

```
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:llmind="https://llmind.org/ns/1.0/">
  <!-- LRFS properties follow -->
</rdf:RDF>
```

Other namespace prefixes (e.g., `dc` for Dublin Core, `xmp` for XMP, `xmpRights`) MAY coexist in the same RDF document.

## 3\. Layer model

An LRFS payload contains zero or more named _layers_. A layer is an RDF property whose subject is the file resource (typically the unnamed root resource in XMP). The following layers are defined in LRFS v1.0:

-   **`llmind:description`** — Free-form natural-language description of the file. Value: string literal (xsd:string).
-   **`llmind:entities`** — RDF Bag of entity descriptors. Each entity is a structured property with `entity:name` (string), `entity:type` (one of: `person`, `place`, `organization`, `concept`, `artifact`), and `entity:confidence` (float 0.0 to 1.0).
-   **`llmind:structure`** — Structured summary of the file's content organization. For PDFs: chapter/section tree with page ranges. For audio: speaker turns with timestamps. For images: layout regions and bounding boxes. Value: structured RDF resource with child properties.
-   **`llmind:transcription`** — For audio files: a transcript with per-segment timestamps. Value: string literal with embedded timestamp markers (format: `[HH:MM:SS] text`).
-   **`llmind:lineage`** — Provenance information. Child properties: `lineage:source` (URL or description), `lineage:created` (ISO 8601 datetime), `lineage:transformations` (RDF Bag of transformation records), `lineage:license` (SPDX identifier).
-   **`llmind:ocr`** — OCR output cache. Child properties: `ocr:provider` (string), `ocr:text` (string), `ocr:pages` (RDF Bag of page records, each with `page:number`, `page:text`, and optional `page:boxes`).

Each layer is _optional_. An LRFS payload with zero layers is valid but conveys no semantic information. Implementations MUST gracefully handle missing layers.

## 4\. Canonicalization for signing

Before signing, each layer MUST be canonicalized to enable reproducible signature verification. The canonicalization algorithm is the RDF 1.1 canonicalization algorithm (ISWC/W3C RDF Dataset Canonicalization, RFC 8785 JCS is NOT used; we use RDF-specific canonicalization to match RDF/XML semantic equivalence).

The process:

1.  Extract the RDF triples for the layer from the XMP RDF graph.
2.  Apply RDF Dataset Canonicalization (https://www.w3.org/TR/rdf11-datasets/#canonicalization) to these triples.
3.  Serialize the canonical triples as N-Quads with UTF-8 encoding and \\n line endings.
4.  The resulting byte string is the canonical form for that layer.

**Critical:** Implementers MUST NOT sign the serialized RDF/XML representation directly. Signing must use the canonical N-Quads form. RDF/XML can serialize the same triples in different textual forms, leading to different byte strings and signature mismatches.

## 5\. Numeric precision

Floating-point values in confidence scores, temporal offsets, and similar numeric properties MUST be represented with at most 6 decimal places in the canonical N-Quads form. Implementers rounding to more or fewer digits will produce different canonical byte strings and will fail signature verification. Use rounding rules: if a value is 0.123456789, round to 0.123457 (round-to-nearest-even).

## 6\. Character encoding

UTF-8 encoding is mandatory throughout. XMP packets specify their own encoding in the XML declaration. LRFS requires `encoding="UTF-8"` and MUST reject payloads that declare or use any other encoding (e.g., UTF-16, ISO-8859-1).

## 7\. Version compatibility

Within the v1.x version family, new optional layers MAY be added in minor versions (e.g., v1.1, v1.2). Readers of an older v1.x MUST ignore unknown `llmind:*` properties and MUST NOT fail validation.

Any change that invalidates existing payloads or breaks existing signatures requires a new major version with a new namespace URI (e.g., `https://llmind.org/ns/2.0/`). The v1.0 namespace will remain stable forever and will never be reused.

## 8\. Related chapters

-   [Signing scheme](https://llmind.org/spec/signing-scheme/) — HMAC-SHA256, ed25519, and file-level checksums
-   [Conformance](https://llmind.org/spec/conformance/) — Reader and writer conformance levels
-   [Namespace landing](https://llmind.org/ns/1.0/) — Namespace reservation and binding
-   [File enrichment glossary](https://llmind.org/glossary/) — definitions of terms used throughout the specification

Reading the full specification? See the [consolidated LRFS v1.0 view](https://llmind.org/spec/lrfs-v1.0/).
