---
title: "LRFS v1.0 — Consolidated Specification | LLMind"
description: "The full LRFS v1.0 specification on one page: payload format, signing scheme, and conformance levels. Citation-ready URL for the LLM-Ready File Specification."
url: https://llmind.org/spec/lrfs-v1.0/
source_format: html
---
# LRFS v1.0 — Consolidated Specification

LRFS v1.0 · Released · Last updated 2026-04-24

The full LLM-Ready File Specification on one page. This document consolidates the three normative chapters — payload format, signing scheme, and conformance — in a single citation-ready URL.

## Table of contents

1.  [Payload format](#payload-format)
2.  [Signing scheme](#signing-scheme)
3.  [Conformance](#conformance)
4.  [Namespace](#namespace)
5.  [References](#references)

## 1\. Payload format

This chapter specifies how an LLM-Ready File Specification (LRFS) payload is structured, serialized, and canonicalized. It is normative — implementers conforming to LRFS v1 MUST follow the rules in this chapter.

## 1\. Host packet

The LRFS payload is carried inside an XMP packet per ISO 16684-1. The packet must be placed in a location specific to each host file format:

-   **JPEG**: APP1 segment
-   **PNG**: iTXt chunk with keyword `XML:com.adobe.xmp`
-   **PDF**: Metadata stream (XMP per PDF 1.4+)
-   **MP3**: ID3v2 PRIV frame carrying XMP
-   **WAV**: RIFF INFO chunk or xmp subchunk
-   **M4A**: XMP box within the file container

Implementers MUST reference the XMP specification (ISO 16684-1) and the format-specific appendices for precise byte-level placement rules. This chapter does not duplicate XMP packet serialization — it assumes the packet is correctly embedded per the standard.

## 2\. Namespace binding

The LRFS namespace URI is `https://llmind.org/ns/1.0/`. Implementers MUST bind the prefix `llmind` to this URI in the XMP RDF document. Example:

```
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
  xmlns:llmind="https://llmind.org/ns/1.0/">
  <!-- LRFS properties follow -->
</rdf:RDF>
```

Other namespace prefixes (e.g., `dc` for Dublin Core, `xmp` for XMP, `xmpRights`) MAY coexist in the same RDF document.

## 3\. Layer model

An LRFS payload contains zero or more named _layers_. A layer is an RDF property whose subject is the file resource (typically the unnamed root resource in XMP). The following layers are defined in LRFS v1.0:

-   **`llmind:description`** — Free-form natural-language description of the file. Value: string literal (xsd:string).
-   **`llmind:entities`** — RDF Bag of entity descriptors. Each entity is a structured property with `entity:name` (string), `entity:type` (one of: `person`, `place`, `organization`, `concept`, `artifact`), and `entity:confidence` (float 0.0 to 1.0).
-   **`llmind:structure`** — Structured summary of the file's content organization. For PDFs: chapter/section tree with page ranges. For audio: speaker turns with timestamps. For images: layout regions and bounding boxes. Value: structured RDF resource with child properties.
-   **`llmind:transcription`** — For audio files: a transcript with per-segment timestamps. Value: string literal with embedded timestamp markers (format: `[HH:MM:SS] text`).
-   **`llmind:lineage`** — Provenance information. Child properties: `lineage:source` (URL or description), `lineage:created` (ISO 8601 datetime), `lineage:transformations` (RDF Bag of transformation records), `lineage:license` (SPDX identifier).
-   **`llmind:ocr`** — OCR output cache. Child properties: `ocr:provider` (string), `ocr:text` (string), `ocr:pages` (RDF Bag of page records, each with `page:number`, `page:text`, and optional `page:boxes`).

Each layer is _optional_. An LRFS payload with zero layers is valid but conveys no semantic information. Implementations MUST gracefully handle missing layers.

## 4\. Canonicalization for signing

Before signing, each layer MUST be canonicalized to enable reproducible signature verification. The canonicalization algorithm is the RDF 1.1 canonicalization algorithm (ISWC/W3C RDF Dataset Canonicalization, RFC 8785 JCS is NOT used; we use RDF-specific canonicalization to match RDF/XML semantic equivalence).

The process:

1.  Extract the RDF triples for the layer from the XMP RDF graph.
2.  Apply RDF Dataset Canonicalization (https://www.w3.org/TR/rdf11-datasets/#canonicalization) to these triples.
3.  Serialize the canonical triples as N-Quads with UTF-8 encoding and \\n line endings.
4.  The resulting byte string is the canonical form for that layer.

**Critical:** Implementers MUST NOT sign the serialized RDF/XML representation directly. Signing must use the canonical N-Quads form. RDF/XML can serialize the same triples in different textual forms, leading to different byte strings and signature mismatches.

## 5\. Numeric precision

Floating-point values in confidence scores, temporal offsets, and similar numeric properties MUST be represented with at most 6 decimal places in the canonical N-Quads form. Implementers rounding to more or fewer digits will produce different canonical byte strings and will fail signature verification. Use rounding rules: if a value is 0.123456789, round to 0.123457 (round-to-nearest-even).

## 6\. Character encoding

UTF-8 encoding is mandatory throughout. XMP packets specify their own encoding in the XML declaration. LRFS requires `encoding="UTF-8"` and MUST reject payloads that declare or use any other encoding (e.g., UTF-16, ISO-8859-1).

## 7\. Version compatibility

Within the v1.x version family, new optional layers MAY be added in minor versions (e.g., v1.1, v1.2). Readers of an older v1.x MUST ignore unknown `llmind:*` properties and MUST NOT fail validation.

Any change that invalidates existing payloads or breaks existing signatures requires a new major version with a new namespace URI (e.g., `https://llmind.org/ns/2.0/`). The v1.0 namespace will remain stable forever and will never be reused.

## 8\. Related chapters

-   [Signing scheme](https://llmind.org/spec/signing-scheme/) — HMAC-SHA256, ed25519, and file-level checksums
-   [Conformance](https://llmind.org/spec/conformance/) — Reader and writer conformance levels
-   [Namespace landing](https://llmind.org/ns/1.0/) — Namespace reservation and binding
-   [File enrichment glossary](https://llmind.org/glossary/) — definitions of terms used throughout the specification

## 2\. Signing scheme

This chapter specifies how LRFS payloads are signed so downstream consumers can verify integrity and detect tampering. Per-layer signing is mandatory for conformance level L3; file-level checksum is mandatory for L2 and above.

## 1\. Per-layer signing: HMAC-SHA256

HMAC-SHA256 provides symmetric-key signing for private corpora where all parties share a pre-shared key. The signature is computed as follows:

1.  For each layer `llmind:<layer>`, canonicalize the layer's RDF triples per the payload-format chapter (§4).
2.  Compute `HMAC-SHA256(key, canonical_bytes)` where `canonical_bytes` is the N-Quads canonical form as UTF-8 bytes.
3.  Encode the resulting 32-byte MAC as base64 (RFC 4648, no padding).
4.  Store the base64 MAC as the value of `llmind:<layer>_signature`.
5.  Store an opaque key identifier as `llmind:<layer>_signature_key`. This identifier is NOT the key itself; it is metadata that helps the verifier locate the correct key. Example: `"prod-key-2026-q2"` or a UUID.

Key management is out of scope for the LRFS specification. Implementers document their own key distribution and rotation policies. Keys MUST never be stored in the LRFS payload itself.

## 2\. Per-layer signing: ed25519

ed25519 provides asymmetric signing for public verification. Any party with the signer's public key can verify the signature without possessing the private key. The process:

1.  For each layer `llmind:<layer>`, canonicalize the layer's RDF triples per the payload-format chapter (§4).
2.  Sign the canonical bytes with the ed25519 private key to produce a 64-byte signature.
3.  Encode the signature as base64 (RFC 4648, no padding).
4.  Store the base64 signature as `llmind:<layer>_signature`.
5.  Store the public-key reference as `llmind:<layer>_signature_key`. The reference MUST be a dereferenceable URL, such as an HTTPS endpoint serving the public key in PEM format or a `.well-known` path.

Implementers MAY fetch and cache public keys. The specification RECOMMENDS fetching over HTTPS and validating the TLS certificate chain. Implementers SHOULD set a reasonable TTL (e.g., 1 hour) on cached keys and allow manual key rotation via configuration.

The ed25519 private key MUST NOT leave the signing environment. Keys stored in HSMs, KMS systems, or secure enclaves are acceptable.

## 3\. File-level checksum

In addition to per-layer signing, the writer computes a file-level SHA-256 checksum covering the file's content independent of metadata. This allows verifiers to detect modifications to the file body even if the XMP packet is rewritten or updated.

The algorithm:

1.  Extract all bytes from the original file EXCEPT the XMP packet itself.
2.  Compute the SHA-256 digest of these bytes.
3.  Encode the digest as hexadecimal (lowercase, 64 ASCII characters).
4.  Store the hex digest as the value of `llmind:file_checksum` inside the XMP packet.

**Implementation note:** The exact byte-range exclusion depends on the host file format. For JPEG, exclude the APP1 segment. For PNG, exclude the iTXt chunk. For PDF, exclude the Metadata stream. Format-specific rules are defined in the format appendices (v1.1 will detail these; v1.0 assumes implementers reference the XMP specification and format-specific guidelines).

## 4\. Verification algorithm

Pseudo-code for verifying an LRFS payload:

```
function verifyLRFS(file, hmacKey, ed25519PublicKeyUrl):
  # Extract XMP packet from file
  xmpPacket = extractXMP(file)
  parseRDF(xmpPacket)  # Load all llmind:* properties

  # Per-layer verification
  for layer in ["description", "entities", "structure", ...]:
    if llmind:<layer> not in RDF:
      continue  # Layer not present, skip

    # Canonicalize the layer's triples
    canonicalBytes = canonicalize(getRDFTriplesFor(layer))

    # Get the stored signature and key
    signature = llmind:<layer>_signature
    keyRef = llmind:<layer>_signature_key

    # Verify HMAC or ed25519
    if isHMAC(signature):
      expected = base64Encode(HMAC_SHA256(hmacKey, canonicalBytes))
      if signature != expected:
        FAIL("HMAC signature mismatch for layer " + layer)
    elif isEd25519(signature):
      pubKey = fetchPublicKey(keyRef)  # Dereference the URL
      if not ed25519Verify(pubKey, canonicalBytes, base64Decode(signature)):
        FAIL("ed25519 signature invalid for layer " + layer)

  # File-level checksum
  bodyBytes = extractFileBodyExcludingXMP(file)
  expectedChecksum = SHA256_hex(bodyBytes)
  if llmind:file_checksum != expectedChecksum:
    FAIL("File checksum mismatch — body modified")

  PASS("All signatures and checksums valid")
```

## 5\. Key rotation and revocation

The current LRFS v1.0 specification does not define a revocation protocol. An ed25519 public key is either trusted or not; there is no list of revoked keys.

Implementers using ed25519 SHOULD publish a key-rotation policy alongside their public keys. For example, a key published at `https://example.com/.well-known/llmind-keys` can include rotation schedules and guidance. Any future revocation protocol will be specified in LRFS v2.0 (which will use a new namespace URI and backward-compatibility-breaking changes).

## 6\. Security considerations

**HMAC keys:** Shared keys MUST be distributed out-of-band via secure channels (e.g., manual distribution, KMS, or secure provisioning systems). Keys MUST NOT be embedded in code, configuration files, or version control systems.

**ed25519 private keys:** Private keys MUST NOT leave the signing environment. Use hardware security modules (HSMs), key management services (KMS), or secure enclaves.

**Tamper detection:** An attacker with write access to a file's body bytes but not the signing key cannot forge a valid signature. The signed layers protect against silent modification. However, an attacker WITH access to the signing key or the shared HMAC key can modify both content and signatures.

**Signature verification must fail closed:** Any verification error (missing signature, invalid format, algorithm mismatch, signature mismatch, checksum mismatch) MUST cause the operation to reject the payload. Implementations MUST NOT silently ignore verification errors or fall back to unsigned data.

## 7\. Related chapters

-   [Payload format](https://llmind.org/spec/payload-format/) — Canonicalization algorithm (§4)
-   [Conformance](https://llmind.org/spec/conformance/) — L3 conformance requires full signing support
-   [File enrichment glossary](https://llmind.org/glossary/) — definitions of terms used throughout the specification

## 3\. Conformance

This chapter specifies what a third-party implementation must do to claim conformance with LLM-Ready File Specification v1. Three conformance levels (L1–L3) are defined; an implementation MAY claim the highest level it fully supports.

## 1\. Conformance levels

LRFS defines three levels of conformance, each building on the previous:

-   **L1 (Read)** — The implementation can parse LRFS payloads from all 6 supported file formats (JPEG, PNG, PDF, MP3, WAV, M4A), extract all defined layers, and return layer contents to the caller. Signature verification is NOT required at L1. L1 readers can load unsigned payloads or payloads with unknown signatures.
-   **L2 (Write)** — The implementation produces LRFS payloads that any conformant L1 reader can parse and extract. Layers MAY be written without signatures. L2 writers MUST produce valid RDF and correct XMP packet placement per format. File-level checksums are recommended at L2.
-   **L3 (Signed Read + Write)** — Full L1 + L2 capabilities plus cryptographic signature verification (both HMAC-SHA256 and ed25519) and signature generation. L3 writers MUST produce signed layers and file-level checksums. L3 readers MUST verify all signatures before accepting payload content.

An implementation MAY claim multiple levels. For example, an implementation might claim "L2 writer and L3 reader" — it can write unsigned payloads but can read and verify signed payloads.

## 2\. L1 requirements (Read)

An L1-conformant reader MUST:

-   Parse XMP packets from all 6 supported file formats (JPEG APP1, PNG iTXt, PDF Metadata stream, MP3 ID3v2 PRIV, WAV RIFF, M4A XMP box) according to ISO 16684-1 and format-specific guidelines.
-   Extract all `llmind:*` properties from the RDF graph and return them to the caller with their types intact.
-   Gracefully ignore unknown `llmind:*` properties (those not defined in v1.0 or added in v1.1+). Forward compatibility is mandatory.
-   Reject payloads with non-UTF-8 encoding and report the error distinctly.
-   Parse RDF/XML that conforms to RDF 1.1. Reject malformed RDF.

An L1 reader SHOULD report parse errors distinctly from signature errors. An L1 reader MAY cache parsed layers in memory.

## 3\. L2 requirements (Write)

An L2-conformant writer MUST:

-   Produce valid XMP packets placed correctly per each host file format (JPEG APP1 segment, PNG iTXt chunk, PDF Metadata stream, etc.).
-   Bind the `llmind:` prefix to the namespace URI `https://llmind.org/ns/1.0/` in the RDF document.
-   Use UTF-8 encoding throughout and declare `encoding="UTF-8"` in the XML declaration.
-   Produce RDF/XML that parses as valid RDF 1.1 under standard tools (e.g., RDF parsers from Apache Jena, W3C RDF libraries, or equivalent).
-   NOT write properties outside the LRFS v1.x-defined set unless using a clearly distinct namespace prefix (e.g., `custom:`, `myapp:`).
-   Compute file-level SHA-256 checksums and store them as `llmind:file_checksum`.

An L2 writer SHOULD preserve any existing XMP packet content (e.g., Dublin Core properties like `dc:creator`, rights properties like `xmpRights:*`) when adding LRFS properties. Overwriting existing metadata is permitted but not recommended.

## 4\. L3 requirements (Signed)

An L3-conformant implementation MUST support both reading and writing signed layers:

-   **Signing algorithms:** Implement both HMAC-SHA256 and ed25519 signatures. An L3 writer MUST sign all layers using at least one algorithm; an L3 reader MUST verify signatures computed with either algorithm.
-   **Canonicalization:** Canonicalize each layer per the payload-format chapter (§4, RDF Dataset Canonicalization) before signing or verifying.
-   **File-level checksums:** Compute SHA-256 file-level checksums per the signing-scheme chapter (§3). L3 readers MUST verify checksums; L3 writers MUST generate them.
-   **Verification:** Implement the verification algorithm (signing-scheme §4). Any verification error (invalid signature, checksum mismatch, missing signature) MUST cause the operation to REJECT the payload and report an error.
-   **Fail closed:** Do NOT silently ignore signature errors or fall back to unsigned data. Signature failures are fatal.

L3 readers SHOULD support key caching and TTL management for ed25519 public keys fetched from URLs. L3 writers SHOULD use secure key management (HSM, KMS, secure enclaves) for ed25519 private keys.

## 5\. Test vectors

A public conformance test suite lives at [`/spec/test-vectors/`](https://llmind.org/spec/test-vectors/) on this site. Each vector is a self-contained JSON file with `inputs` and `expected` fields that any LRFS implementation (in any language) can validate against. The machine-readable index at [`/spec/test-vectors/index.json`](https://llmind.org/spec/test-vectors/index.json) enumerates every vector with its category and spec section. The test suite covers:

-   Sample files in all 6 formats with LRFS payloads
-   Signed and unsigned payloads
-   Intentionally malformed payloads for error handling
-   Canonicalization test cases
-   Key references and test keys for ed25519 and HMAC

Implementations claim conformance by running the test suite and publishing their results (PASS/FAIL per test).

## 6\. Claiming conformance

An implementation MAY state one of the following:

-   **"Conforms to LRFS v1.0, level L1"** — Read-only
-   **"Conforms to LRFS v1.0, level L2"** — Write (unsigned)
-   **"Conforms to LRFS v1.0, level L3"** — Full signed read+write

Conformance is self-reported; there is no central certification body. Third parties can challenge a conformance claim by running the official test suite against the implementation.

Implementations SHOULD document:

-   Which file formats are supported (subset of JPEG, PNG, PDF, MP3, WAV, M4A)
-   Which layers are supported (subset of description, entities, structure, transcription, lineage, ocr)
-   For L3: which cryptographic algorithms are implemented (HMAC-SHA256, ed25519, or both)

## Failure-mode conformance

Beyond the canonicalization, signing, and file-checksum vectors, LRFS v1.0 publishes [failure-mode test vectors](https://llmind.org/spec/test-vectors/) that a conformant reader MUST handle correctly. A reader that passes every happy-path vector may still crash on malformed input or silently accept tampered files — failure-mode coverage is the distinction between a parser and a conformant verifier.

The published failure-mode categories cover:

-   **Tampered signatures** — the reader MUST reject payloads whose signatures fail verification.
-   **Malformed XMP** — the reader MUST surface a structured parse error, not crash, on syntactically invalid input.
-   **Algorithm/length mismatch** — the reader MUST reject payloads whose declared signing algorithm is inconsistent with the signature byte length.
-   **Empty payload** — the reader MUST distinguish a valid but empty LRFS container (no layers, no signature) from a corrupt payload, returning an empty layer set without error.
-   **Unknown namespace version** — per LRFS v1.0 §7, readers of v1.x MUST ignore payloads using a future major-version namespace (e.g., `/ns/2.0/`) rather than attempt to parse them.

Each fixture is a self-contained JSON file with `inputs`, an `expected_outcome` (one of `reject`, `ignore`, or `parse_error`), and a `reason` field that explains the spec requirement. Implementations self-report conformance by running these fixtures against their reader; see [/spec/implementations/](https://llmind.org/spec/implementations/) for the published implementations directory.

## 7\. Related chapters

-   [Payload format](https://llmind.org/spec/payload-format/) — Normative RDF and canonicalization rules
-   [Signing scheme](https://llmind.org/spec/signing-scheme/) — Normative signing and verification algorithms
-   [File enrichment glossary](https://llmind.org/glossary/) — definitions of terms used throughout the specification

## 4\. Namespace

The stable XMP namespace for LRFS v1.x is [https://llmind.org/ns/1.0/](https://llmind.org/ns/1.0/). It is reserved in perpetuity for LRFS v1.x. Breaking changes will use `/ns/2.0/` and leave 1.0 untouched forever.

## 5\. References

-   [LRFS test vectors](https://llmind.org/spec/test-vectors/) — 10 public JSON fixtures for self-verifying implementations.
-   [LRFS implementations](https://llmind.org/spec/implementations/) — directory of known LRFS-conformant implementations.
-   [LRFS — glossary entry](https://llmind.org/glossary/lrfs/)
-   [LLM-Ready File Specification (chapter TOC view)](https://llmind.org/spec/)
-   arXiv preprint: in preparation (link will be added upon submission).
