pyfdb.pyfdb#
Classes#
Constructor for FDB object. |
Module Contents#
- class FDB(config: str | dict | pathlib.Path | None = None, user_config: str | dict | pathlib.Path | None = None)#
Constructor for FDB object.
- Parameters:
config (str | dict | Path | None, optional) – Config object for setting up the FDB. See Notes.
user_config (str | dict | Path | None, optional) – Config object for setting up user specific options, e.g., enabling sub-TOCs. See Notes.
- Return type:
returns: FDB object
Note
- Every config parameter but is converted accordingly depending on its type:
str is used as a yaml representation to parse the config
dict is interpreted as hierarchical format to represent a config, see example
Path is interpreted as a location of the config and read as a YAML file
None is the fallback. The default config in $FDB_HOME is loaded
Using a single PyFDB instance per individual threads is safe. Sharing the instances across threads isn’t supported. However, the underlying FDB and its methods are thread-safe; the caller needs to be aware that flush acts on all archive calls, including archived messages from other threads. A call to flush will persist all archived messages regardless from which thread the message has been archived. In case the caller wants a finer control it is advised to instantiate one FDB object per thread to ensure only messages are flushed that have been archived on the same FDB object.
Examples
>>> fdb = pyfdb.FDB(fdb_config_path) >>> config = { ... "type":"local", ... "engine":"toc", ... "schema":"<schema_path>", ... "spaces":[ ... { ... "handler":"Default", ... "roots":[ ... {"path": "<db_store_path>"}, ... ], ... } ... ], ... } >>> fdb = pyfdb.FDB(config)
Or leveraging the context manager:
>>> with pyfdb.FDB(fdb_config_path) as fdb: ... # Call methods of fdb ... pass
- logger#
- __enter__()#
- __exit__(exc_type, exc_value, exc_traceback)#
- archive(data: bytes, identifier: pyfdb.pyfdb_type.MarsIdentifier | None = None)#
Archive binary data into the underlying FDB. In case an identifier is supplied, that identifier is used to archive the data. No consistency checks are applied. The caller needs to ensure the provided identifier matches metadata present in data.
If no identifier is supplied, data is interpreted as GRIB data and the metadata is taken from the GRIB messages.
In any case, the supplied or derived metadata needs to provide values for all required keys of the FDB schema.
- Parameters:
data (bytes) – The binary data to be archived. If no key is provided this is interpreted by eccodes and may contain multiple GRIB messages.
identifier (Identifier | None, optional) – A unique identifier for the archived data. - If provided, the data will be stored under this identifier. - If None, the data will be archived without an explicit identifier, metadata has to be derivable from the data, which is interpreted as GRIB data.
Note
Sometimes an identifier is also referred to as a Key.
- Return type:
None
Examples
>>> fdb = pyfdb.FDB() >>> filename = data_path / "x138-300.grib" >>> fdb.archive(data=filename.read_bytes()) # Archive >>> fdb.archive(identifier=Identifier([("key-1", "value-1")]), data=filename.read_bytes()) >>> fdb.flush() # Sync the archive call
- flush()#
Flush all buffers and close all data handles of the underlying FDB into a consistent DB state. Always safe to call
- Parameters:
None
- Return type:
None
Examples
>>> fdb = pyfdb.FDB() >>> filename = data_path / "x138-300.grib" >>> fdb.archive(bytes=filename.read_bytes()) # Archive >>> fdb.flush() # Data is synced
- retrieve(mars_selection: pyfdb.pyfdb_type.MarsSelection) pyfdb.pyfdb_type.DataHandle#
Retrieve data which is specified by a MARS selection.
- Parameters:
mars_selection – MARS selection which describes the data which should be retrieved
Note
The returned data handle doesn’t guarantee the order of the GRIB messages.
- Returns:
A data handle which contains unordered GRIB messages and can be read like a BytesLike object.
- Return type:
Examples
>>> mars_selection = {"key-1": "value-1", ...} >>> data_handle = pyfdb.retrieve(mars_selection) >>> data_handle.open() >>> data_handle.read(4) >>> data_handle.close()
Or leveraging the context manager:
>>> with pyfdb.retrieve(selection) as data_handle: >>> assert data_handle >>> assert data_handle.read(4) == b"GRIB"
- list(mars_selection: pyfdb.pyfdb_type.MarsSelection, include_masked: bool = False, level: int = 3) collections.abc.Generator[pyfdb.pyfdb_iterator.ListElement, None, None]#
List data present at the underlying fdb archive and which can be retrieved.
- Parameters:
mars_selection (MarsSelection) – A MARS selection which describes the data which can be listed. If None is given, all data will be listed.
include_masked (bool, optional) – If True, the returned iterator lists masked data, if False the elements are unique.
level (int [1-3], optional) – Specifies the FDB schema level of the elements which are matching the selection. A level of 1 means return a level 1 key (of the FDB schema) which is matching the MARS selection.
- Returns:
A generator for ListElement describing FDB entries containing data of the MARS selection
- Return type:
Generator[ListElement, None, None]
Note
this call lists masked elements if `include_masked` is `True`.
Examples
>>> selection = { >>> "type": "an", >>> "class": "ea", >>> "domain": "g", >>> "expver": "0001", >>> "stream": "oper", >>> "date": "20200101", >>> "levtype": "sfc", >>> "step": "0", >>> "time": "1800", >>> } >>> list_iterator = pyfdb.list(selection) # level == 3 >>> elements = list(list_iterator) >>> print(elements[0])
{class=ea,expver=0001,stream=oper,date=20200101,time=1800,domain=g} {type=an,levtype=sfc} {step=0,param=131}, tocfieldlocation[uri=uri[scheme=file,name=<location>],offset=10732,length=10732,remapkey={}], length=10732, timestamp=176253515
>>> list_iterator = pyfdb.list(selection, level=2) >>> elements = list(list_iterator) >>> print(elements[0])
{class=ea,expver=0001,stream=oper,date=20200101,time=1800,domain=g} {type=an,levtype=sfc}, length=0, timestamp=0
>>> list_iterator = pyfdb.list(selection, level=1) >>> elements = list(list_iterator) >>> print(elements[0])
{class=ea,expver=0001,stream=oper,date=20200101,time=1800,domain=g}, length=0, timestamp=0
- inspect(mars_selection: pyfdb.pyfdb_type.MarsSelection) collections.abc.Generator[pyfdb.pyfdb_iterator.ListElement, None, None]#
Inspects the content of the underlying FDB and returns a generator of list elements describing which field was part of the MARS selection.
- Parameters:
mars_selection (MarsSelection) – An MARS selection for which the inspect should be executed
- Returns:
A generator for ListElement describing FDB entries containing data of the MARS selection
- Return type:
Generator[ListElement, None, None]
Examples
>>> selection = { >>> "type": "an", >>> "class": "ea", >>> "domain": "g", >>> "expver": "0001", >>> "stream": "oper", >>> "date": "20200101", >>> "levtype": "sfc", >>> "step": "0", >>> "param": "167", >>> "time": "1800", >>> } >>> list_iterator = pyfdb.inspect(selection) >>> elements = list(list_iterator) # single element in iterator >>> elements[0] {class=ea,expver=0001,stream=oper,date=20200101,time=1800,domain=g} {type=an,levtype=sfc} {param=167,step=0}, TocFieldLocation[ uri=URI[scheme=<location>], offset=0, length=10732, remapKey={} ], length=10732, timestamp=1762537447
- status(mars_selection: pyfdb.pyfdb_type.MarsSelection) collections.abc.Generator[pyfdb.pyfdb_iterator.StatusElement, None, None]#
List the status of all FDB entries with their control identifiers, e.g., whether a certain database was locked for retrieval.
- Parameters:
mars_selection (MarsSelection) – An MARS selection which specifies the queried data
- Returns:
A generator for StatusElement describing FDB entries and their control identifier
- Return type:
Generator[StatusElement, None, None]
Examples
>>> selection = { >>> "type": "an", >>> "class": "ea", >>> "domain": "g", >>> }, >>> ) >>> status_iterator = pyfdb.status(selection) >>> elements = list(status_iterator) >>> elements[0] StatusElement( control_identifiers=[], key={ 'class': ['ea'], 'type': ['an'], 'date': ['20200104'], 'domain': ['g'], 'expver': ['0001'], 'stream': ['oper'], 'time': ['2100'] }, location=/<path_to_root>/ea:0001:oper:20200104:2100:g )
- wipe(mars_selection: pyfdb.pyfdb_type.MarsSelection, doit: bool = False, porcelain: bool = False, unsafe_wipe_all: bool = False) collections.abc.Generator[pyfdb.pyfdb_iterator.WipeElement, None, None]#
Wipe data from the database.
Delete FDB databases and the data therein contained. Use the passed selection to identify the database to delete. This is equivalent to a UNIX rm command. This function deletes either whole databases, or whole indexes within databases
- Parameters:
mars_selection (MarsSelection) – An MARS selection which specifies the affected data
doit (bool, optional) – If true the wipe command is executed, per default there are only dry-run
porcelain (bool, optional) – Restricts the output to the wiped files
unsafe_wipe_all (bool, optional) – Flag for disabling all security checks and force a wipe
- Returns:
A generator for WipeElement
- Return type:
Generator[WipeElement, None, None]
Note
Wipe elements are not directly corresponding to the wiped files. This can be a cause for confusion. The individual wipe elements strings of the wipe output.
Examples
>>> fdb = pyfdb.FDB(fdb_config_path) >>> wipe_iterator = fdb.wipe({"class": "ea"}) >>> wiped_elements = list(wipe_iterator) ... Toc files to delete: <path_to_database>/toc ...
- purge(mars_selection: pyfdb.pyfdb_type.MarsSelection, doit: bool = False, porcelain: bool = False) collections.abc.Generator[pyfdb.pyfdb_iterator.PurgeElement, None, None]#
Remove duplicate data from the database.
Purge duplicate entries from the database and remove the associated data if the data is owned and not adopted. Data in the FDB5 is immutable. It is masked, but not removed, when overwritten with new data using the same key. Masked data can no longer be accessed. Indexes and data files that only contains masked data may be removed.
If an index refers to data that is not owned by the FDB (in particular data which has been adopted from an existing FDB5), this data will not be removed.
- Parameters:
mars_selection (MarsSelection) – A MARS selection which describes the data which is purged.
doit (bool, optional) – If true the wipe command is executed, per default there are only dry-run
porcelain (bool, optional) – Restricts the output to the wiped files
- Returns:
A generator for PurgeElement
- Return type:
Generator[PurgeElement, None, None]
Examples
>>> fdb = pyfdb.FDB(fdb_config_path) >>> purge_iterator = fdb.purge({"class": "ea"}), doit=True) >>> purged_elements = list(purge_iterator) >>> print(purged_elements[0]) {class=ea,expver=0001,stream=oper,date=20200104,time=1800,domain=g} {type=an,levtype=sfc} {step=0,param=167}, TocFieldLocation[ uri=URI[ scheme=file, name=<location> ], offset=32196, length=10732, remapKey={} ], length=10732, timestamp=176253976
- stats(mars_selection: pyfdb.pyfdb_type.MarsSelection) collections.abc.Generator[pyfdb.pyfdb_iterator.StatsElement, None, None]#
Print information about FDB databases, aggregating the information over all the databases visited into a final summary.
- Parameters:
mars_selection (MarsSelection) – A MARS selection which specifies the affected data.
- Returns:
A generator for StatsElement
- Return type:
Generator[StatsElement, None, None]
Examples
>>> fdb = pyfdb.FDB(fdb_config_path) >>> stats_iterator = fdb.stats(selection) >>> for el list(stats_iterator): >>> print(el) Index Statistics: Fields : 3 Size of fields : 32,196 (31.4414 Kbytes) Reacheable fields : 3 Reachable size : 32,196 (31.4414 Kbytes)
DB Statistics: Databases : 1 TOC records : 2 Size of TOC files : 2,048 (2 Kbytes) Size of schemas files : 228 (228 bytes) TOC records : 2 Owned data files : 1 Size of owned data files : 32,196 (31.4414 Kbytes) Index files : 1 Size of index files : 131,072 (128 Kbytes) Size of TOC files : 2,048 (2 Kbytes) Total owned size : 165,544 (161.664 Kbytes) Total size : 165,544 (161.664 Kbytes)
- control(mars_selection: pyfdb.pyfdb_type.MarsSelection, control_action: pyfdb.pyfdb_type.ControlAction, control_identifiers: collections.abc.Collection[pyfdb.pyfdb_type.ControlIdentifier]) collections.abc.Generator[pyfdb.pyfdb_iterator.ControlElement, None, None]#
Enable certain features of FDB databases, e.g., disables or enables retrieving, list, etc.
- Parameters:
mars_selection (MarsSelection) – A MARS selection which specifies the affected data.
control_action (ControlAction) – Which action should be modified, e.g., ControlAction.RETRIEVE
control_identifiers (list[ControlIdentifier]) – Should an action be enabled or disabled, e.g., ControlIdentifier.ENABLE or ControlIdentifier.DISABLE
- Returns:
A generator for ControlElement
- Return type:
Generator[ControlElement, None, None]
Note
Disabling of an ControlAction, e.g., ControlAction.RETRIEVE leads to the creation of a retrieve.lock in the corresponding FDB database. This is true for all actions. The file is removed after the Action has been disabled.
It’s important to consume the iterator, otherwise the lock file isn’t deleted which can cause unexpected behavior. Also, due to internal reuse of databases, create a new FDB object before relying on the newly set control_identifier, to propagate the status.
Examples
>>> fdb = pyfdb.FDB(fdb_config_path) >>> selection = { >>> "class": "ea", >>> "domain": "g", >>> "expver": "0001", >>> "stream": "oper", >>> "date": "20200101", >>> "time": "1800", >>> } >>> control_iterator = fdb.control( >>> selection, >>> ControlAction.DISABLE, >>> [ControlIdentifier.RETRIEVE], >>> ) >>> elements = list(control_iterator) >>> print(elements[0]) ControlElement( control_identifiers=[RETRIEVE], key={ 'class': ['ea'], 'date': ['20200104'], 'domain': ['g'], 'expver': ['0001'], 'stream': ['oper'], 'time': ['2100'] }, location=/<path_to_root>/ea:0001:oper:20200104:2100:g )
- axes(mars_selection: pyfdb.pyfdb_type.MarsSelection, level: int = 3) pyfdb.pyfdb_iterator.IndexAxis#
Return the ‘axes’ and their extent of a MARS selection for a given level of the schema in an IndexAxis object.
If a key isn’t specified the entire extent (all values) are returned.
- Parameters:
mars_selection (MarsSelection) – A MARS selection which specifies the affected data.
level (int [1-3], optional) – Level of the FDB Schema. Only keys of the given level are returned.
- Returns:
A map containing Key-Value pairs of the axes and their extent
- Return type:
Examples
>>> fdb = pyfdb.FDB(fdb_config_path) >>> selection = { ... "type": "an", ... "class": "ea", ... "domain": "g", ... "expver": "0001", ... "stream": "oper", ... "levtype": "sfc", ... "step": "0", ... "time": "1800", ... } >>> index_axis: IndexAxis = fdb.axes(selection) # level == 3 >>> for k, v in index_axis.items(): ... print(f"k={k} | v={v}") k=class | v=['ea'] k=date | v=['20200101', '20200102', '20200103', '20200104'] k=domain | v=['g'] k=expver | v=['0001'] k=levelist | v=[''] k=levtype | v=['sfc'] k=param | v=['131', '132', '167'] k=step | v=['0'] k=stream | v=['oper'] k=time | v=['1800'] k=type | v=['an']
- enabled(control_identifier: pyfdb.pyfdb_type.ControlIdentifier) bool#
Check whether a specific control identifier is enabled
- Parameters:
control_identifier (ControlIdentifier) – A given control identifier
- Returns:
True if the given control identifier is set, False otherwise.
- Return type:
bool
Examples
>>> fdb_config = yaml.safe_load(fdb_config_path) >>> fdb_config["writable"] = False >>> fdb = pyfdb.FDB(fdb_config) >>> fdb.enabled(ControlIdentifier.NONE) # == True >>> fdb.enabled(ControlIdentifier.LIST) # == True >>> fdb.enabled(ControlIdentifier.RETRIEVE) # == True >>> fdb.enabled(ControlIdentifier.ARCHIVE) # == False, default True >>> fdb.enabled(ControlIdentifier.WIPE) # == False, default True >>> fdb.enabled(ControlIdentifier.UNIQUEROOT) # == True
- dirty()#
Return whether a flush of the FDB is needed, for example if data was archived since the last flush.
- Parameters:
None
- Returns:
True if an archive happened and a flush is needed, False otherwise.
- Return type:
bool
Examples
>>> fdb = FDB(fdb_config_file) >>> filename = <data_path> >>> fdb.archive(open(filename, "rb").read()) >>> fdb.dirty() # == True >>> fdb.flush() >>> fdb.dirty() # == False
- config() tuple[dict[str, Any], dict[str, Any]]#
Return the system and user configuration of the underlying FDB.
- Parameters:
None
- Returns:
Python dictionaries describing the system and user configuration
- Return type:
tuple[dict[str, Any], dict[str, Any]]
Examples
>>> fdb = FDB(config_file) >>> system_config, user_config = fdb.config() >>> print(system_config) >>> print(user_config)
- __repr__() str#