Provenance

The verifiable origin and modification history of a file or its content.

Provenance is the documented origin and history of an object — where it came from, who handled it, and what transformations were applied. In museums, provenance is the chain of custody that proves an artifact is authentic. In digital files, provenance means knowing who created the file, when, with what device, and what edits were made afterward. C2PA is a standard for embedding provenance into images. For datasets, provenance tracks source URLs, transformations, and license terms.

In file metadata

File provenance traditionally includes creator (who made it), timestamp (when), device (with what camera or software), and editing history (subsequent changes). C2PA formalizes this by signing provenance manifests: a viewer can cryptographically verify that a JPEG came from a Canon camera operated by Alice on March 15, 2024, and was edited twice afterward. Without signatures, provenance is just a human claim. With them, it's a machine-verifiable fact.

In AI datasets

Dataset provenance tracks where each file came from (source URL), what preprocessing was applied, and what license governs its use. This is critical for legal compliance and reproducibility: if a model was trained on licensed data, you need to document which license applies and whether commercial use is permitted. LRFS's lineage layer captures all of this — source, transformation log, timestamp, and license — per file, all signed.

See also