Dimension Mapping#

A MARS request defines which data to retrieve from FDB. Each keyword with more than one value defines an axis and must be mapped to a Zarr dimension via AxisDefinition. Keywords with a single value may also be mapped — useful when MARS restricts a keyword to one value but you still want it as an explicit dimension in the resulting array.

From MARS Keywords to Zarr Dimensions#

Each AxisDefinition passed to add_part() becomes exactly one dimension in the resulting Zarr array.

  • The position of each AxisDefinition in the list determines its dimension index in the array.

  • An implicit final dimension always contains the grid points (decoded field values).

One-to-One Mapping#

In the simplest case, each MARS keyword maps to its own Zarr dimension.

[
    AxisDefinition(["date"], Chunking.SINGLE_VALUE),   # Dim 0
    AxisDefinition(["time"], Chunking.SINGLE_VALUE),   # Dim 1
    AxisDefinition(["param"], Chunking.SINGLE_VALUE),  # Dim 2
]

Given date=2020-01-01/to/2020-01-03, time=0/6/12/18, and param=165/166/167, the resulting array has shape (3, 4, 3, N) where N is the number of grid points.

Many-to-One Mapping#

Multiple MARS keywords can be flattened into a single Zarr dimension. A common use case is merging date and time into a unified datetime axis.

[
    AxisDefinition(["date", "time"], Chunking.SINGLE_VALUE),  # Dim 0
    AxisDefinition(["param"], Chunking.SINGLE_VALUE),         # Dim 1
]

The dimension size equals the product of the number of values of each keyword. With date having 3 values and time having 4:

Dimension size = 3 × 4 = 12

The rightmost key varies fastest (row-major order, like C and NumPy defaults). In ["date", "time"], time cycles through all its values before date advances:

Index:  0    1    2    3    4    5    6    7    8    9   10   11
date:   d0   d0   d0   d0   d1   d1   d1   d1   d2   d2   d2   d2
time:   t0   t1   t2   t3   t0   t1   t2   t3   t0   t1   t2   t3

index = time + date × num_times

Important

The order of keys matters. With ["time", "date"], date becomes the fastest-varying keyword instead of time.

Axis Mapping Visualized#

        graph LR
    subgraph MARS["MARS Request Keywords"]
        date["date (3 values)"]
        time["time (4 values)"]
        param["param (3 values)"]
        step["step (1 value)"]
    end

    subgraph AD["AxisDefinitions"]
        ad0["AxisDefinition 0<br> keys=['date', 'time']"]
        ad1["AxisDefinition 1<br> keys=['param']"]
        ad2["AxisDefinition 2<br> keys=['step']"]
    end

    subgraph Zarr["Zarr Array Dimensions"]
        dim0["Dim 0: datetime<br>size = 3 x 4 = 12"]
        dim1["Dim 1: param<br>size = 3"]
        dim2["Dim 2: step<br>size = 1"]
        dim3["Dim 3: grid points<br>(implicit)"]
    end

    date --> ad0
    time --> ad0
    param --> ad1
    step --> ad2
    ad0 --> dim0
    ad1 --> dim1
    ad2 --> dim2
    

Chunking#

Chunking determines how many values along a dimension are grouped into a single Zarr chunk:

Chunking mode

Behaviour

Chunk size along axis

SINGLE_VALUE

Each value along the axis is its own chunk

1

NONE

The entire axis is stored in a single chunk

Full axis length

For example, with date having 3 values and param having 3 values:

[
    AxisDefinition(["date"], Chunking.NONE),          # chunk size = 3
    AxisDefinition(["param"], Chunking.SINGLE_VALUE), # chunk size = 1
]
# Array shape:  (3, 3, N)
# Chunk shape:  (3, 1, N)

Memory Considerations#

Each chunk access loads the entire chunk into memory. With SINGLE_VALUE each chunk contains one set of grid-point values, keeping memory usage small. With NONE the chunk spans the full axis, and when multiple axes use NONE the chunk sizes compound.

For example, consider a grid with 1 million points (N = 1_000_000) and three axes all set to NONE:

[
    AxisDefinition(["date"], Chunking.NONE),   # 30 values
    AxisDefinition(["time"], Chunking.NONE),   # 4 values
    AxisDefinition(["param"], Chunking.NONE),   # 10 values
]
# Chunk shape: (30, 4, 10, 1_000_000)
# Chunk size:  30 × 4 × 10 × 1_000_000 × 4 bytes = ~4.5 GB

Accessing any element in this array loads the single 4.5 GB chunk. Switching to SINGLE_VALUE on all three axes reduces each chunk to a single field (1 × 1 × 1 × 1_000_000 × 4 bytes 4 MB).

Warning

Using NONE on multiple axes can cause unexpectedly large memory allocations. Start with SINGLE_VALUE on all axes and only switch individual axes to NONE when you know you always consume them in full.

Combining Multiple MARS Requests#

Call add_part() multiple times to combine data from different MARS requests into a single Zarr array. Use extendOnAxis() to specify which dimension grows when parts are joined. All other dimensions must have the same number of values across parts.

builder = SimpleStoreBuilder()

# Part 1: surface parameters
# Dimension D is count date x time
# Dimension P1 is count param
# Dimension N is the number of values in the grid
# Resulting shape of this part is [D, P1, N]
builder.add_part(
    "levtype=sfc,param=165/166,...",
    [
        AxisDefinition(["date", "time"], Chunking.SINGLE_VALUE),
        AxisDefinition(["param"], Chunking.SINGLE_VALUE),
    ],
    ExtractorType.GRIB,
)

# Part 2: pressure level parameters
# Dimension D is count date x time
# Dimension P2 is count param x levelist
# Dimension N is the number of values in the grid
# Resulting shape of this part is [D, P2, N]
builder.add_part(
    "levtype=pl,param=131/132,levelist=50/100,...",
    [
        AxisDefinition(["date", "time"], Chunking.SINGLE_VALUE),
        AxisDefinition(["param", "levelist"], Chunking.SINGLE_VALUE),
    ],
    ExtractorType.GRIB,
)

# Extend on the param dimension (index 1)
# Final shape will be [D, P1 + P2, N]
builder.extendOnAxis(1)
store = builder.build()

The datetime dimension (index 0) must have the same values in both parts. The param dimension (index 1) grows: 2 surface parameters + 4 pressure-level combinations (2 params × 2 levels) = 6 entries total.