Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Benchmark Results

This page is a snapshot of benchmark results recorded on a specific machine. For methodology, flags, and how to re-run, see Benchmarks.

Note: Timing and throughput are machine-specific. Compression ratios, sizes, and fidelity metrics are determined by the codec and are reproducible.

Run metadata

FieldValue
Date2026-04-16
Tensogram version0.13.0
CPUApple M4, 10 cores / 10 threads
OSmacOS 26.3 (Darwin 25.3.0)
Rustrustc 1.94.1
ecCodes2.46.0
Methodology10 timed iterations, 3 warmup, median reported

Codec Matrix

16 million float64 values (122 MiB). The test data is a synthetic smooth scientific-like field with values in the range 250–310 (a profile that also matches real temperature grids and other bounded-range physical measurements).

How fidelity is measured

After each encode→decode round-trip, the decoded values are compared to the original. Three error norms are reported, all absolute in the same units as the input:

  • Linf — the largest error for any single value. Answers: “what is the worst case?”
  • L1 — the average error across all values. Answers: “how far off are values on average?”
  • L2 (RMSE) — root mean square error. Like L1 but penalizes large outliers more heavily. Answers: “how large are the typical errors, weighted toward the worst ones?”

For lossless codecs all three are zero.

Lossless compressors on raw floats

No encoding step — raw 64-bit floats compressed directly. Decoded values are bit-identical to the original.

MethodEnc (ms)Dec (ms)Enc MB/sDec MB/sRatioSize (MiB)
no compression [REF]3.73.73281833226100.0%122.1
zstd level 3128.5114.5950106690.3%110.2
LZ48.57.41432816535100.4%122.6
Blosc251.926.62350458475.2%91.8
szip69.7206.81753590100.9%123.2

Raw 64-bit floats have high entropy, so most lossless compressors cannot reduce their size. LZ4 and szip slightly expand the data. Blosc2 is the exception — its byte-shuffle step exposes compressible patterns (75%).

SimplePacking (quantization) + lossless compressors

Values are quantized to N bits, then compressed. Fidelity depends only on the bit width, not on the compressor — see the fidelity table below.

MethodEnc (ms)Dec (ms)Enc MB/sDec MB/sRatioSize (MiB)
16-bit only17.315.17039807825.0%30.5
16-bit + zstd54.236.22254337524.4%29.7
16-bit + LZ419.722.26204549325.1%30.6
16-bit + Blosc2115.231.51060387320.3%24.8
16-bit + szip53.999.32263122914.6%17.8
24-bit only19.217.16347713537.5%45.8
24-bit + zstd67.341.11813296937.2%45.4
24-bit + LZ431.523.53871518837.6%46.0
24-bit + Blosc2124.940.0978305232.8%40.0
24-bit + szip63.3133.5192891427.2%33.2
32-bit only21.225.35771482550.0%61.0
32-bit + zstd97.837.01248329949.8%60.8
32-bit + LZ437.145.13287270650.2%61.3
32-bit + Blosc2141.038.3866318345.3%55.3
32-bit + szip69.8157.4174877539.7%48.4

Fidelity by bit width

Bit widthLinf (max abs)L1 (mean abs)L2 (RMSE)
16 bits4.9 × 10⁻⁴2.4 × 10⁻⁴2.8 × 10⁻⁴
24 bits1.9 × 10⁻⁶9.5 × 10⁻⁷1.1 × 10⁻⁶
32 bits7.5 × 10⁻⁹3.7 × 10⁻⁹4.3 × 10⁻⁹

For context: with input values around 280, a Linf of 1.9 × 10⁻⁶ means the worst-case relative error at 24 bits is roughly 7 parts per billion.

Lossy floating-point compressors

These operate directly on raw f64 bytes without quantization.

MethodEnc (ms)Dec (ms)Enc MB/sDec MB/sRatioSize (MiB)
ZFP rate 16220.1304.255540125.0%30.5
ZFP rate 24248.0468.549226137.5%45.8
ZFP rate 32288.0581.042421050.0%61.0
SZ3 abs 0.01131.4141.09298656.5%7.9

Fidelity by lossy codec

MethodLinf (max abs)L1 (mean abs)L2 (RMSE)
ZFP rate 161.3 × 10⁻²1.6 × 10⁻³2.0 × 10⁻³
ZFP rate 245.6 × 10⁻⁵6.1 × 10⁻⁶7.9 × 10⁻⁶
ZFP rate 321.9 × 10⁻⁷2.4 × 10⁻⁸3.1 × 10⁻⁸
SZ3 abs 0.011.0 × 10⁻²5.0 × 10⁻³5.8 × 10⁻³

Notable observations

  • 16-bit + szip achieves the best compression ratio (14.6%) among the SimplePacking combinations.
  • SZ3 achieves the smallest output overall (6.5%) with a max error of 0.01. If your application tolerates that error bound, this gives the best compression in this benchmark.
  • In this benchmark, higher ZFP rates gave proportionally smaller errors. ZFP fixed-rate modes always hit their target ratio exactly (25% / 37.5% / 50%).

Reference Comparison: ecCodes GRIB Encoding

GRIB is a binary format widely used in operational weather forecasting, and ecCodes (from ECMWF) is a common implementation. Comparing against it gives a concrete, reproducible reference point for Tensogram’s quantisation + entropy-coding pipeline.

This benchmark runs Tensogram’s 24-bit SimplePacking + szip and ecCodes’ built-in packing methods on the same input. Both sides are timed end-to-end: from a float64 array to serialised compressed bytes (encode), and back (decode).

10 million float64 values (76 MiB), 24-bit packing. Different dataset size from the codec matrix above.

MethodEnc (ms)Dec (ms)Enc MB/sDec MB/sRatioSize (MiB)
ecCodes CCSDS [REF]47.984.8159490027.2%20.8
ecCodes simple packing32.67.92339966037.5%28.6
Tensogram 24-bit + szip43.780.4174595027.4%20.9

All three methods produce identical fidelity: Linf = 1.9 × 10⁻⁶, L1 = 9.5 × 10⁻⁷, L2 = 1.1 × 10⁻⁶.

Notable observations

  • Tensogram and ecCodes CCSDS achieve nearly identical compression (27.4% vs 27.2%) and identical fidelity at 24 bits.
  • Tensogram encode is now slightly faster than ecCodes CCSDS (43.7 vs 47.9 ms) on this machine; decode is comparable (80.4 vs 84.8 ms).
  • ecCodes simple packing decodes fastest (7.9 ms) but produces a larger file (37.5% vs 27%).

Threading Scaling

The v0.13.0 multi-threaded coding pipeline lets callers spend a threads budget on encode/decode work. Results here show the effect of sweeping threads ∈ {0, 1, 2, 4, 8} on 16M f64 values (122 MiB) for seven representative codec combinations. threads=0 is the sequential baseline; speedups are measured against it.

Reminder: Transparent codecs (no codec, simple_packing, szip, lz4, zfp, sz3, shuffle) produce byte-identical encoded payloads across thread counts. Opaque codecs (blosc2, zstd with nb_workers > 0) may produce different compressed bytes while always round-tripping losslessly.

Lossless (no encoding)

MethodMetricthreads=0threads=1threads=2threads=4threads=8
none+noneenc MB/s3281835929368013517335520
none+nonespeedup1.00x1.09x1.12x1.07x1.08x
none+lz4enc MB/s77333619355920292513
none+lz4speedup1.00x0.47x0.46x0.26x0.32x
none+zstd(3)enc MB/s9421163207522591839
none+zstd(3)speedup1.00x1.23x2.20x2.40x1.95x
none+blosc2(lz4)enc MB/s31503140503074588906
none+blosc2(lz4)speedup1.00x1.00x1.60x2.37x2.83x

SimplePacking + compression

MethodMetricthreads=0threads=1threads=2threads=4threads=8
sp(16)+noneenc MB/s1296413268155841564314612
sp(16)+noneenc speedup1.00x1.02x1.20x1.21x1.13x
sp(16)+nonedec speedup1.00x1.14x2.37x2.34x2.18x
sp(24)+szipenc MB/s22732263235123892427
sp(24)+szipspeedup1.00x1.00x1.03x1.05x1.07x
sp(24)+blosc2(lz4)enc MB/s23712350396555546388
sp(24)+blosc2(lz4)enc speedup1.00x0.99x1.67x2.34x2.69x

Notable observations

  • Memory-bound baselines (none+none, none+lz4) do not scale. The parallel dispatch overhead outweighs any gain when the work per task is already at memory bandwidth. none+lz4 actually regresses — leave threads=0 for lz4-only workloads.
  • blosc2 scales best. Encoding with blosc2+lz4 reaches 2.8× on 8 threads; the sp(24)+blosc2 combination reaches 2.7× on encode and 1.3× on decode.
  • zstd scales ~2.4× on encode at 4 threads via libzstd’s NbWorkers. Beyond 4 threads the benefit plateaus on this CPU.
  • simple_packing decode is 2.3× faster at 2+ threads — the internal chunk-parallel scatter saturates memory bandwidth quickly.
  • szip is single-threaded. The marginal gains shown for sp(24)+szip come from parallelising the simple_packing stage only; szip itself runs sequentially in v0.13.0.

The raw numbers above were produced by the threads-scaling binary in rust/benchmarks. Re-run locally with:

cargo build --release -p tensogram-benchmarks
./target/release/threads-scaling \
    --num-points 16000000 \
    --iterations 5 \
    --warmup 2 \
    --threads 0,1,2,4,8