LRFS payload format
This chapter specifies how an LLM-Ready File Specification (LRFS) payload is structured, serialized, and canonicalized. It is normative — implementers conforming to LRFS v1 MUST follow the rules in this chapter.
1. Host packet
The LRFS payload is carried inside an XMP packet per ISO 16684-1. The packet must be placed in a location specific to each host file format:
- JPEG: APP1 segment
- PNG: iTXt chunk with keyword
XML:com.adobe.xmp - PDF: Metadata stream (XMP per PDF 1.4+)
- MP3: ID3v2 PRIV frame carrying XMP
- WAV: RIFF INFO chunk or xmp subchunk
- M4A: XMP box within the file container
Implementers MUST reference the XMP specification (ISO 16684-1) and the format-specific appendices for precise byte-level placement rules. This chapter does not duplicate XMP packet serialization — it assumes the packet is correctly embedded per the standard.
2. Namespace binding
The LRFS namespace URI is https://llmind.org/ns/1.0/. Implementers MUST bind
the prefix llmind to this URI in the XMP RDF document. Example:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:llmind="https://llmind.org/ns/1.0/">
<!-- LRFS properties follow -->
</rdf:RDF>
Other namespace prefixes (e.g., dc for Dublin Core, xmp
for XMP, xmpRights) MAY coexist in the same RDF document.
3. Layer model
An LRFS payload contains zero or more named layers. A layer is an RDF property whose subject is the file resource (typically the unnamed root resource in XMP). The following layers are defined in LRFS v1.0:
-
llmind:description— Free-form natural-language description of the file. Value: string literal (xsd:string). -
llmind:entities— RDF Bag of entity descriptors. Each entity is a structured property withentity:name(string),entity:type(one of:person,place,organization,concept,artifact), andentity:confidence(float 0.0 to 1.0). -
llmind:structure— Structured summary of the file's content organization. For PDFs: chapter/section tree with page ranges. For audio: speaker turns with timestamps. For images: layout regions and bounding boxes. Value: structured RDF resource with child properties. -
llmind:transcription— For audio files: a transcript with per-segment timestamps. Value: string literal with embedded timestamp markers (format:[HH:MM:SS] text). -
llmind:lineage— Provenance information. Child properties:lineage:source(URL or description),lineage:created(ISO 8601 datetime),lineage:transformations(RDF Bag of transformation records),lineage:license(SPDX identifier). -
llmind:ocr— OCR output cache. Child properties:ocr:provider(string),ocr:text(string),ocr:pages(RDF Bag of page records, each withpage:number,page:text, and optionalpage:boxes).
Each layer is optional. An LRFS payload with zero layers is valid but conveys no semantic information. Implementations MUST gracefully handle missing layers.
4. Canonicalization for signing
Before signing, each layer MUST be canonicalized to enable reproducible signature verification. The canonicalization algorithm is the RDF 1.1 canonicalization algorithm (ISWC/W3C RDF Dataset Canonicalization, RFC 8785 JCS is NOT used; we use RDF-specific canonicalization to match RDF/XML semantic equivalence).
The process:
- Extract the RDF triples for the layer from the XMP RDF graph.
- Apply RDF Dataset Canonicalization (https://www.w3.org/TR/rdf11-datasets/#canonicalization) to these triples.
- Serialize the canonical triples as N-Quads with UTF-8 encoding and \n line endings.
- The resulting byte string is the canonical form for that layer.
Critical: Implementers MUST NOT sign the serialized RDF/XML representation directly. Signing must use the canonical N-Quads form. RDF/XML can serialize the same triples in different textual forms, leading to different byte strings and signature mismatches.
5. Numeric precision
Floating-point values in confidence scores, temporal offsets, and similar numeric properties MUST be represented with at most 6 decimal places in the canonical N-Quads form. Implementers rounding to more or fewer digits will produce different canonical byte strings and will fail signature verification. Use rounding rules: if a value is 0.123456789, round to 0.123457 (round-to-nearest-even).
6. Character encoding
UTF-8 encoding is mandatory throughout. XMP packets specify their own encoding in
the XML declaration. LRFS requires encoding="UTF-8" and MUST reject
payloads that declare or use any other encoding (e.g., UTF-16, ISO-8859-1).
7. Version compatibility
Within the v1.x version family, new optional layers MAY be added in minor
versions (e.g., v1.1, v1.2). Readers of an older v1.x MUST ignore unknown
llmind:* properties and MUST NOT fail validation.
Any change that invalidates existing payloads or breaks existing signatures
requires a new major version with a new namespace URI (e.g.,
https://llmind.org/ns/2.0/). The v1.0 namespace
will remain stable forever and will never be reused.
8. Related chapters
- Signing scheme — HMAC-SHA256, ed25519, and file-level checksums
- Conformance — Reader and writer conformance levels
- Namespace landing — Namespace reservation and binding
- File enrichment glossary — definitions of terms used throughout the specification