LLMind benchmarks

Reference measurements for LLMind file enrichment — file-size overhead, signing throughput, read-time comparison vs. re-parsing with LlamaParse, Docling, Textract, Mistral OCR.

Measurements pending. The tables below describe what Sprint 3 measures against the reference corpus at huggingface.co/datasets/llmind/reference-enriched-pdfs-v1 . Cells show until the measurement protocol runs. The methodology is committed at docs/benchmarks/sprint-3-methodology.md and the raw CSV at docs/benchmarks/sprint-3-data.csv . Third parties can rerun and publish alternative numbers against the same corpus.

Test corpus

100 public-domain PDFs at huggingface.co/datasets/llmind/reference-enriched-pdfs-v1 : 40 small (<500KB), 40 medium (500KB–5MB), 20 large (5–50MB). Content mix: technical papers, scanned facsimiles, government reports, synthetic research PDFs. Licensed for redistribution (CC0, public domain, or US government works).

File-size overhead

The XMP semantic layer LLMind writes adds bytes to the file. This table measures how many — as a percentage of the original, and in absolute bytes.

Measurement Value Notes
avg-overhead-percent Average XMP payload size as % of original file bytes across the 100-PDF corpus.
median-overhead-percent Median (less sensitive to huge files).
p95-overhead-percent 95th percentile — worst case for small PDFs.
avg-absolute-bytes Average bytes added per file (for intuition — expect single-digit KB).

Signing throughput

How fast LLMind can sign the semantic layer on commodity hardware. Measured in isolation (pure crypto; no OCR / parse in the hot path).

Algorithm Throughput Notes
hmac-sha256 HMAC-SHA256 signing of the semantic layer (default algorithm).
ed25519 ed25519 signing (optional; for public-key verification).
file-checksum-sha256 SHA-256 file-content checksum (excludes XMP packet).

Read-time comparison

The core value proposition: reading the cached LRFS semantic layer from XMP is orders of magnitude faster than re-parsing the same PDF. Lower milliseconds are better.

Operation Time per file Notes
llmind-cached-read Time for a consumer to parse the XMP packet and extract the layer.
llamaparse-reparse Time to re-parse the PDF with LlamaParse (cloud API call; includes network).
docling-reparse Time to re-parse with Docling (local
textract-reparse Time to re-parse with AWS Textract (cloud API call; includes network).
mistral-ocr-reparse Time to re-parse with Mistral OCR (cloud API call).
speedup-factor-vs-llamaparse Speedup from reading cached LRFS vs. re-running LlamaParse.
speedup-factor-vs-textract Speedup from reading cached LRFS vs. re-running AWS Textract.

What's not measured

Some adjacent questions are deliberately out of scope. The methodology document explains each exclusion in detail. In short:

Reproducibility

The corpus, the methodology, and the raw CSV are all committed to git. Third-party reviewers can rerun each step and publish alternative numbers. Results vary with hardware generation (especially for HMAC-SHA256 throughput), network conditions (for cloud-API baselines), and file selection.

The Sprint 3 CSV stays frozen as the reference point. Future re-runs land at docs/benchmarks/sprint-4-data.csv, etc.