Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Working with Files

The TensogramFile struct provides a high-level API for reading and writing .tgm files. It handles lazy scanning, buffered appending, and random access by message index.

Creating a File

#![allow(unused)]
fn main() {
use tensogram::{TensogramFile, EncodeOptions};

let mut file = TensogramFile::create("forecast.tgm")?;
}

This creates (or truncates) the file. No data is written yet.

Appending Messages

#![allow(unused)]
fn main() {
use std::collections::BTreeMap;
use tensogram::{
    GlobalMetadata, DataObjectDescriptor, ByteOrder, Dtype, EncodeOptions,
};

let global = GlobalMetadata::default();

let desc = DataObjectDescriptor {
    obj_type: "ntensor".to_string(),
    ndim: 2,
    shape: vec![100, 200],
    strides: vec![200, 1],
    dtype: Dtype::Float32,
    byte_order: ByteOrder::Big,
    encoding: "none".to_string(),
    filter: "none".to_string(),
    compression: "none".to_string(),
    masks: None,
    params: BTreeMap::new(),
};

    file.append(&global, &[(&desc, &data)], &EncodeOptions::default())?;
}

Each append encodes one message and appends it to the end of the file. You can call it as many times as you like — each message is independent and self-describing.

Typical pattern for writing a multi-message file (one message per parameter, run, subject, sample, experiment — whatever your pipeline produces):

#![allow(unused)]
fn main() {
let mut file = TensogramFile::create("output.tgm")?;

for key in ["2t", "10u", "10v", "msl"] {
    let (global, desc, data) = produce_field(key);
    file.append(&global, &[(&desc, &data)], &EncodeOptions::default())?;
}
}

Opening and Counting Messages

#![allow(unused)]
fn main() {
let mut file = TensogramFile::open("forecast.tgm")?;

// Streaming scan happens here (lazily, on first access)
let count = file.message_count()?;
println!("{} messages in file", count);
}

The first access triggers a streaming scan that reads preamble-sized chunks and seeks forward, so it never loads the entire file into memory. After that, every read_message call is a seek + read — no further scanning.

Reading Messages

#![allow(unused)]
fn main() {
use tensogram::{decode, DecodeOptions};

// Read raw bytes of message 3
let raw_bytes = file.read_message(3)?;

// Decode message 3
let (meta, objects) = decode(&raw_bytes, &DecodeOptions::default())?;

// Each element is (DataObjectDescriptor, Vec<u8>)
let (ref desc, ref data) = objects[0];
println!("shape: {:?}, dtype: {}", desc.shape, desc.dtype);
}

Both are O(1) after the initial scan: they seek to the stored offset and read length bytes.

Iterating Over All Messages

#![allow(unused)]
fn main() {
let mut file = TensogramFile::open("forecast.tgm")?;

for raw in file.iter()? {
    let raw = raw?;
    let meta = tensogram::decode_metadata(&raw)?;
    println!("version: {}", meta.version);
}
}

Memory note: For files with many large messages, prefer iterating by index with read_message(i) inside a loop to process one at a time.

Random Access by Index

One of Tensogram’s design goals is O(1) object access. After scanning, any message is reachable in constant time. Within a message, any object is reachable in constant time via the binary header’s offset table:

flowchart TD
    A["file.read_message(42)"]
    B["Message bytes"]
    C["Binary header"]
    D["Seek to payload 2"]
    E["Decode only object 2"]

    A -- "seek + read" --> B
    B --> C
    C -- "lookup offset for object 2" --> D
    D --> E

    style A fill:#388e3c,stroke:#2e7d32,color:#fff
    style E fill:#1565c0,stroke:#0d47a1,color:#fff

File Layout Diagram

forecast.tgm
├── [message 0] — TENSOGRM ... 39277777
├── [message 1] — TENSOGRM ... 39277777
├── [message 2] — TENSOGRM ... 39277777
│   ├── Preamble (24B)
│   ├── Header Metadata Frame (CBOR GlobalMetadata)
│   ├── Header Index Frame (CBOR offsets)
│   ├── Data Object Frame 0 (payload + CBOR descriptor)
│   └── Data Object Frame 1 (payload + CBOR descriptor)
│   └── Postamble (16B)
└── ...

No file-level header, no file-level index. All indexing is per-message, built in-memory at scan time.

Remote Access (optional)

Enable the remote feature to open .tgm files on S3, GCS, Azure, or HTTP with selective range-based reads:

[dependencies]
tensogram = { path = "...", features = ["remote"] }
#![allow(unused)]
fn main() {
use tensogram::{TensogramFile, DecodeOptions};

let mut file = TensogramFile::open_source("s3://bucket/forecast.tgm")?;

// Fetch only the second object from message 0 — no full download
let (meta, desc, data) = file.decode_object(0, 1, &DecodeOptions::default())?;
}

Supports header-indexed and footer-indexed files (read-only) from Rust, Python, xarray, and zarr. See the Remote Access guide for storage options, request budgets, and limitations.

Memory-Mapped I/O (optional)

Enable the mmap feature to use memory-mapped file access:

[dependencies]
tensogram = { path = "...", features = ["mmap"] }
#![allow(unused)]
fn main() {
let mut file = TensogramFile::open_mmap("forecast.tgm")?;

// Scan happens during open_mmap — no lazy scan needed
let count = file.message_count()?;

// Reads from the memory-mapped region (no additional seek)
let raw = file.read_message(0)?;
}

This is useful for large files where you want to avoid per-message seek + read overhead. The file is mapped read-only. All existing decode functions work unchanged.

Async I/O (optional)

Enable the async feature for tokio-based non-blocking file operations:

[dependencies]
tensogram = { path = "...", features = ["async"] }
#![allow(unused)]
fn main() {
let mut file = TensogramFile::open_async("forecast.tgm").await?;

// Read a message without blocking the async runtime
let raw = file.read_message_async(0).await?;

// Decode also runs on a blocking thread (safe for FFI codecs)
let (meta, objects) = file.decode_message_async(0, &opts).await?;
}

All CPU-intensive work (scanning, decoding, FFI calls to compression libraries) runs via tokio::task::spawn_blocking, so it won’t block the async runtime.

Edge Cases

Appending to an Existing File

TensogramFile::create truncates. To append to an existing file, use standard file I/O:

#![allow(unused)]
fn main() {
use std::io::Write;
let mut f = std::fs::OpenOptions::new().append(true).open("forecast.tgm")?;

let global = GlobalMetadata::default();
let message = encode(&global, &[(&desc, &data)], &EncodeOptions::default())?;
f.write_all(&message)?;
}

Or open the file with TensogramFile::open and use append() — the append method always writes at the end regardless of how the file was opened.

Corrupted Messages

The scanner skips corrupted messages and continues. A message is considered corrupted if:

  • The total_length field points to a location where 39277777 is not present
  • The header is truncated

The scanner recovers by advancing one byte and searching for the next TENSOGRM.

Empty Files

message_count() returns 0 for an empty file. read_message(0) returns an error.