Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

anemoi-inference Integration

The tensogram-anemoi package provides a plug-and-play output for anemoi-inference, the ECMWF framework for running AI-based weather forecast models. Once installed, anemoi-inference automatically discovers the plugin via Python entry points — no code changes to anemoi-inference are required.

Installation

pip install tensogram-anemoi

Or from source:

pip install -e python/tensogram-anemoi/

Usage

In an anemoi-inference run config, specify tensogram as the output:

output:
  tensogram:
    path: forecast.tgm

All forecast steps are written to a single .tgm file as they are produced. Remote destinations (S3, GCS, Azure, …) are supported via fsspec:

output:
  tensogram:
    path: s3://my-bucket/forecast.tgm
    storage_options:
      key: ...
      secret: ...

Configuration options

All options after path must be supplied as keyword arguments.

OptionTypeDefaultDescription
pathstrDestination file path or remote URL
encodingstr"none""none" or "simple_packing"
bitsintNoneBits per value (required when encoding="simple_packing")
compressionstr"zstd""none", "zstd", "lz4", "szip", "blosc2"
dtypestr"float32"Field array dtype: "float32" or "float64"
storage_optionsdict{}Forwarded to fsspec for remote paths
stack_pressure_levelsboolFalseStack pressure-level fields into 2-D objects
variableslist[str]NoneRestrict output to a subset of variables
output_frequencyintNoneWrite every N steps
write_initial_stateboolNoneWhether to write step 0

Pressure-level stacking

When stack_pressure_levels=True, all fields sharing the same GRIB param are merged into a single 2-D object of shape (n_grid, n_levels), sorted by level ascending. The "mars" namespace carries "levelist": [500, 850, ...] instead of a scalar "level" (following standard MARS convention). Non-pressure-level fields are always written as individual 1-D objects.

output:
  tensogram:
    path: forecast.tgm
    stack_pressure_levels: true

Simple packing

For compact storage, use simple_packing with a bits value:

output:
  tensogram:
    path: forecast.tgm
    encoding: simple_packing
    bits: 16
    compression: zstd

Coordinate arrays (lat/lon) are never lossy-encoded; only field arrays are packed.


Metadata reference

Each .tgm file produced by tensogram-anemoi contains one message per forecast step. This section documents exactly what is stored in each message and how to read it with the raw tensogram Python API.

Opening a file

import tensogram

tgm = tensogram.TensogramFile.open("forecast.tgm")
print(len(tgm), "steps")

meta, objects = tgm[0]   # first step

meta is the decoded message metadata. objects is a list of (descriptor, array) pairs, one entry per object in the message.

Object layout

Every message has the following fixed layout:

Indexbase[i]["name"]Content
0"grid_latitude"Latitude coordinates, float64, shape (n_grid,)
1"grid_longitude"Longitude coordinates, float64, shape (n_grid,)
2 … Nvariable name or param nameField data
meta, objects = tgm[0]

lat_desc, lat_arr = objects[0]   # latitudes
lon_desc, lon_arr = objects[1]   # longitudes
fld_desc, fld_arr = objects[2]   # first field

The coordinate names "grid_latitude" and "grid_longitude" are intentionally distinct from the standard "latitude" / "longitude" names so that all objects in a message share a single flat grid dimension rather than each coordinate spawning its own dimension.

base[i] — per-object metadata

Each object has a corresponding entry in meta.base:

for i, entry in enumerate(meta.base):
    print(i, entry)

Every entry contains:

KeyTypePresent onDescription
"name"strall objectsVariable or coordinate name
"anemoi"dictall objectsanemoi-specific metadata (see below)
"mars"dictfield objects onlyMARS metadata (see below)

"anemoi" namespace

KeyTypePresent onDescription
"variable"strall objectsInternal anemoi-inference variable name

For coordinates, "variable" is "latitude" or "longitude" (the canonical name, not the "grid_*" name stored in "name"):

assert meta.base[0]["name"] == "grid_latitude"
assert meta.base[0]["anemoi"]["variable"] == "latitude"

assert meta.base[1]["name"] == "grid_longitude"
assert meta.base[1]["anemoi"]["variable"] == "longitude"

For fields, "variable" is the internal anemoi-inference name (e.g. "t500" for 500 hPa temperature, "2t" for 2 m temperature):

assert meta.base[2]["anemoi"]["variable"] == "2t"

"mars" namespace

Coordinate objects carry no "mars" key. Every field object carries a "mars" dict combining keys from the anemoi-inference checkpoint with the temporal keys derived from the forecast state:

Temporal keys (present on every field object):

KeyTypeDescriptionExample
"date"strAnalysis/base date (YYYYMMDD)"20240101"
"time"strAnalysis/base time (HHMM)"0000"
"step"int or floatForecast lead time in hours6, 1.5

Checkpoint keys (present when available in the model checkpoint):

KeyTypeDescriptionExample
"param"strGRIB parameter short name"2t", "t", "u"
"levtype"strLevel type"sfc", "pl", "ml"
"level"intPressure level (unstacked fields only)500
"levelist"list[int]Pressure levels (stacked fields only)[500, 850, 1000]

Reading field metadata:

meta, objects = tgm[0]

# Surface field (e.g. 2 m temperature)
entry = meta.base[2]
print(entry["name"])                    # "2t"
print(entry["anemoi"]["variable"])      # "2t"
print(entry["mars"]["param"])           # "2t"
print(entry["mars"]["date"])            # "20240101"
print(entry["mars"]["time"])            # "0000"
print(entry["mars"]["step"])            # 6

# Pressure-level field (unstacked)
entry = meta.base[3]
print(entry["mars"]["param"])           # "t"
print(entry["mars"]["levtype"])         # "pl"
print(entry["mars"]["level"])           # 500

With stack_pressure_levels=True, the pressure-level group has "levelist" instead of "level", and the array is 2-D:

entry = meta.base[2]                    # stacked t group
print(entry["mars"]["levelist"])        # [500, 850, 1000]
print(entry["mars"]["param"])           # "t"

desc, arr = objects[2]
print(arr.shape)                        # (n_grid, 3)  — columns sorted by level

meta.extra — message-level metadata

meta.extra carries metadata that applies to the whole message rather than individual objects.

"dim_names" — axis-size hints

dim_names = meta.extra["dim_names"]
# e.g. {"21600": "values"}
# or   {"21600": "values", "3": "level"}  (with stack_pressure_levels=True)

dim_names maps the string representation of an axis length to a semantic name. It exists to allow downstream tools to assign meaningful axis names without requiring any anemoi-specific knowledge. The grid axis is always labelled "values"; when pressure-level stacking is enabled, each unique level-axis size is labelled "level".

Object descriptors

Each (descriptor, array) pair returned by objects[i] gives low-level encoding detail:

desc, arr = objects[2]

print(desc.dtype)        # "float32" or "float64"
print(desc.shape)        # [n_grid] for flat, [n_grid, n_levels] for stacked
print(desc.encoding)     # "none" or "simple_packing"
print(desc.compression)  # "zstd", "lz4", etc.

Coordinate arrays are always float64 regardless of the dtype setting. Field arrays use the configured dtype ("float32" by default), promoted to float64 automatically when encoding="simple_packing".

Full inspection example

import tensogram

tgm = tensogram.TensogramFile.open("forecast.tgm")

for step_idx, (meta, objects) in enumerate(tgm):
    print(f"\n--- step {step_idx} ---")

    # Dimension hints
    print("dim_names:", meta.extra.get("dim_names", {}))

    for i, entry in enumerate(meta.base):
        desc, arr = objects[i]
        anemoi = entry.get("anemoi", {})
        mars = entry.get("mars", {})

        print(
            f"  [{i}] name={entry['name']!r:20s}"
            f"  variable={anemoi.get('variable')!r:10s}"
            f"  shape={arr.shape}"
            f"  dtype={desc.dtype}"
            + (f"  step={mars.get('step')}" if mars else "")
        )

Example output for a single step with surface fields and stacked pressure levels:

--- step 0 ---
dim_names: {'21600': 'values', '3': 'level'}
  [0] name='grid_latitude'    variable='latitude'   shape=(21600,)  dtype=float64
  [1] name='grid_longitude'   variable='longitude'  shape=(21600,)  dtype=float64
  [2] name='2t'               variable='2t'         shape=(21600,)  dtype=float32  step=6
  [3] name='t'                variable='t'          shape=(21600, 3)  dtype=float32  step=6
  [4] name='u'                variable='u'          shape=(21600, 3)  dtype=float32  step=6