Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

The Encoding Pipeline

Every object payload passes through a three-stage pipeline on the way in (encoding) and out (decoding). The stages always run in the same order:

flowchart TD
    subgraph Encode["Encode Path"]
        direction TB
        A["Raw bytes"]
        B["Stage 1 — Encoding
        (lossy quantization)"]
        C["Stage 2 — Filter
        (byte shuffle)"]
        D["Stage 3 — Compression
        (szip / zstd / lz4 / blosc2 / zfp / sz3)"]
        A --> B --> C --> D
    end

    S[("Stored bytes")]

    subgraph Decode["Decode Path"]
        direction TB
        F["Stage 3 — Decompress"]
        G["Stage 2 — Unshuffle"]
        H["Stage 1 — Dequantize"]
        I["Raw bytes"]
        F --> G --> H --> I
    end

    D --> S --> F

    style A fill:#e8f5e9,stroke:#388e3c
    style S fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style I fill:#e8f5e9,stroke:#388e3c
    style Encode fill:#e3f2fd,stroke:#1565c0,color:#1565c0
    style Decode fill:#fce4ec,stroke:#c62828,color:#c62828

Each stage is independently configurable per object via fields in the DataObjectDescriptor. Set a stage to "none" to skip it. For callers with already-encoded payloads, a pipeline-bypass option exists via encode_pre_encoded (see Pre-encoded Payloads).

Stage 1: Encoding

Encoding transforms values to reduce the number of bits needed to represent them. The only supported encoding right now is simple_packing — a lossy quantisation that maps a bounded range of floating-point values onto N-bit integers. The bit layout matches GRIB 2 simple_packing so quantised payloads are interoperable with existing GRIB tooling.

ValueMeaning
"none"Pass through unchanged
"simple_packing"Lossy quantization (see Simple Packing)

Stage 2: Filter

Filters rearrange bytes to improve compression ratios. The shuffle filter reorders bytes by their significance level (all most-significant bytes first, then all second-most-significant bytes, etc.), which makes float data much more compressible because nearby values have similar high bytes.

ValueMeaning
"none"Pass through unchanged
"shuffle"Byte-level shuffle (see Byte Shuffle Filter)

Stage 3: Compression

Compression reduces the total byte count. Seven compressors are implemented:

ValueTypeRandom AccessNotes
"none"Pass-throughYesNo compression
"szip"LosslessYesCCSDS 121.0-B-3 via libaec
"zstd"LosslessNoExcellent ratio/speed tradeoff
"lz4"LosslessNoFastest decompression
"blosc2"LosslessYesMulti-codec, chunk-level access
"zfp"LossyYes (fixed-rate)Floating-point arrays
"sz3"LossyNoError-bounded scientific data

See Compression for full details on each compressor, including parameters and random access support.

Note: ZFP and SZ3 operate directly on typed floating-point data. Use them with encoding: "none" and filter: "none" – they replace both encoding and compression.

Typical Combinations

Use caseencodingfiltercompression
Exact integers (e.g. a mask)nonenonenone
Lossy bounded-range floatssimple_packingnoneszip
Best lossless (floats)noneshuffleszip or blosc2
GRIB 2 CCSDS-interoperablesimple_packingnoneszip
Real-time streamingnonenonelz4
Archival storagenoneshufflezstd
ML model weightsnonenoneblosc2
Lossy float w/ random accessnonenonezfp (fixed_rate)
Error-bounded sciencenonenonesz3

How It Looks in Code

The entire pipeline is configured through the DataObjectDescriptor:

#![allow(unused)]
fn main() {
DataObjectDescriptor {
    obj_type: "ntensor".into(),
    ndim: 2,
    shape: vec![721, 1440],
    strides: vec![1440, 1],
    dtype: Dtype::Float32,
    byte_order: ByteOrder::Big,
    encoding: "simple_packing".into(),
    filter: "none".into(),
    compression: "szip".into(),
    params: BTreeMap::from([
        ("reference_value".into(), Value::Float(230.5)),
        ("bits_per_value".into(), Value::Integer(16.into())),
    ]),
    hash: None, // set automatically during encoding
}
}

All encoding parameters (reference_value, bits_per_value, szip_block_offsets, etc.) go into the params map. The encoder populates additional params during encoding (like block offsets for szip), and the decoder reads them back.

Integrity Hashing

After all three stages, the stored bytes can be hashed. The hash is stored in the DataObjectDescriptor’s hash field alongside the encoded bytes. On decode, if verify_hash: true is set, the hash is recomputed and compared.

AlgorithmHash lengthNotes
xxh316 hex chars (64-bit)Default. Fast, non-cryptographic

Edge case: The hash covers the stored bytes (after encoding + filter + compression), not the original raw bytes. This means a hash mismatch always indicates storage or transmission corruption, not a quantization difference from lossy encoding.