Dimension Mapping
=================
A MARS request defines which data to retrieve from FDB. Each keyword
with more than one value defines an axis and **must** be mapped to a
Zarr dimension via :class:`~pychunked_data_view.AxisDefinition`.
Keywords with a single value **may** also be mapped — useful when MARS
restricts a keyword to one value but you still want it as an explicit
dimension in the resulting array.
From MARS Keywords to Zarr Dimensions
--------------------------------------
Each :class:`~pychunked_data_view.AxisDefinition` passed to
:meth:`~z3fdb.SimpleStoreBuilder.add_part` becomes **exactly one
dimension** in the resulting Zarr array.
- The position of each :class:`~pychunked_data_view.AxisDefinition` in
the list determines its dimension index in the array.
- An **implicit final dimension** always contains the grid points
(decoded field values).
One-to-One Mapping
~~~~~~~~~~~~~~~~~~
In the simplest case, each MARS keyword maps to its own Zarr dimension.
.. code-block:: python
[
AxisDefinition(["date"], Chunking.SINGLE_VALUE), # Dim 0
AxisDefinition(["time"], Chunking.SINGLE_VALUE), # Dim 1
AxisDefinition(["param"], Chunking.SINGLE_VALUE), # Dim 2
]
Given ``date=2020-01-01/to/2020-01-03``, ``time=0/6/12/18``, and
``param=165/166/167``, the resulting array has shape ``(3, 4, 3, N)``
where ``N`` is the number of grid points.
Many-to-One Mapping
~~~~~~~~~~~~~~~~~~~
Multiple MARS keywords can be flattened into a single Zarr dimension.
A common use case is merging ``date`` and ``time`` into a unified
datetime axis.
.. code-block:: python
[
AxisDefinition(["date", "time"], Chunking.SINGLE_VALUE), # Dim 0
AxisDefinition(["param"], Chunking.SINGLE_VALUE), # Dim 1
]
The dimension size equals the **product** of the number of values of
each keyword. With ``date`` having 3 values and ``time`` having 4:
.. code-block:: text
Dimension size = 3 × 4 = 12
The **rightmost key varies fastest** (row-major order, like C and
NumPy defaults). In ``["date", "time"]``, ``time`` cycles through all
its values before ``date`` advances:
.. code-block:: text
Index: 0 1 2 3 4 5 6 7 8 9 10 11
date: d0 d0 d0 d0 d1 d1 d1 d1 d2 d2 d2 d2
time: t0 t1 t2 t3 t0 t1 t2 t3 t0 t1 t2 t3
index = time + date × num_times
.. important::
The order of keys matters. With ``["time", "date"]``, ``date``
becomes the fastest-varying keyword instead of ``time``.
Axis Mapping Visualized
~~~~~~~~~~~~~~~~~~~~~~~
.. mermaid::
graph LR
subgraph MARS["MARS Request Keywords"]
date["date (3 values)"]
time["time (4 values)"]
param["param (3 values)"]
step["step (1 value)"]
end
subgraph AD["AxisDefinitions"]
ad0["AxisDefinition 0
keys=['date', 'time']"]
ad1["AxisDefinition 1
keys=['param']"]
ad2["AxisDefinition 2
keys=['step']"]
end
subgraph Zarr["Zarr Array Dimensions"]
dim0["Dim 0: datetime
size = 3 x 4 = 12"]
dim1["Dim 1: param
size = 3"]
dim2["Dim 2: step
size = 1"]
dim3["Dim 3: grid points
(implicit)"]
end
date --> ad0
time --> ad0
param --> ad1
step --> ad2
ad0 --> dim0
ad1 --> dim1
ad2 --> dim2
Chunking
--------
:class:`~pychunked_data_view.Chunking` determines how many values along
a dimension are grouped into a single Zarr chunk:
.. list-table::
:header-rows: 1
* - Chunking mode
- Behaviour
- Chunk size along axis
* - :attr:`~pychunked_data_view.Chunking.SINGLE_VALUE`
- Each value along the axis is its own chunk
- 1
* - :attr:`~pychunked_data_view.Chunking.NONE`
- The entire axis is stored in a single chunk
- Full axis length
For example, with ``date`` having 3 values and ``param`` having 3
values:
.. code-block:: python
[
AxisDefinition(["date"], Chunking.NONE), # chunk size = 3
AxisDefinition(["param"], Chunking.SINGLE_VALUE), # chunk size = 1
]
# Array shape: (3, 3, N)
# Chunk shape: (3, 1, N)
Memory Considerations
---------------------
Each chunk access loads the **entire chunk** into memory. With
:attr:`~pychunked_data_view.Chunking.SINGLE_VALUE` each chunk contains
one set of grid-point values, keeping memory usage small.
With :attr:`~pychunked_data_view.Chunking.NONE` the chunk spans the
full axis, and when multiple axes use ``NONE`` the chunk sizes compound.
For example, consider a grid with 1 million points (``N = 1_000_000``)
and three axes all set to ``NONE``:
.. code-block:: python
[
AxisDefinition(["date"], Chunking.NONE), # 30 values
AxisDefinition(["time"], Chunking.NONE), # 4 values
AxisDefinition(["param"], Chunking.NONE), # 10 values
]
# Chunk shape: (30, 4, 10, 1_000_000)
# Chunk size: 30 × 4 × 10 × 1_000_000 × 4 bytes = ~4.5 GB
Accessing **any** element in this array loads the single 4.5 GB chunk.
Switching to :attr:`~pychunked_data_view.Chunking.SINGLE_VALUE` on all
three axes reduces each chunk to a single field
(``1 × 1 × 1 × 1_000_000 × 4 bytes ≈ 4 MB``).
.. warning::
Using :attr:`~pychunked_data_view.Chunking.NONE` on multiple axes
can cause unexpectedly large memory allocations. Start with
:attr:`~pychunked_data_view.Chunking.SINGLE_VALUE` on all axes and
only switch individual axes to ``NONE`` when you know you always
consume them in full.
Combining Multiple MARS Requests
---------------------------------
Call :meth:`~z3fdb.SimpleStoreBuilder.add_part` multiple times to
combine data from different MARS requests into a single Zarr array.
Use :meth:`~z3fdb.SimpleStoreBuilder.extendOnAxis` to specify which
dimension grows when parts are joined. All other dimensions must have
the same number of values across parts.
.. code-block:: python
builder = SimpleStoreBuilder()
# Part 1: surface parameters
# Dimension D is count date x time
# Dimension P1 is count param
# Dimension N is the number of values in the grid
# Resulting shape of this part is [D, P1, N]
builder.add_part(
"levtype=sfc,param=165/166,...",
[
AxisDefinition(["date", "time"], Chunking.SINGLE_VALUE),
AxisDefinition(["param"], Chunking.SINGLE_VALUE),
],
ExtractorType.GRIB,
)
# Part 2: pressure level parameters
# Dimension D is count date x time
# Dimension P2 is count param x levelist
# Dimension N is the number of values in the grid
# Resulting shape of this part is [D, P2, N]
builder.add_part(
"levtype=pl,param=131/132,levelist=50/100,...",
[
AxisDefinition(["date", "time"], Chunking.SINGLE_VALUE),
AxisDefinition(["param", "levelist"], Chunking.SINGLE_VALUE),
],
ExtractorType.GRIB,
)
# Extend on the param dimension (index 1)
# Final shape will be [D, P1 + P2, N]
builder.extendOnAxis(1)
store = builder.build()
The datetime dimension (index 0) must have the same values in both
parts. The param dimension (index 1) grows: 2 surface parameters +
4 pressure-level combinations (2 params × 2 levels) = 6 entries total.