Metadata-Version: 2.4
Name: h5yaml
Version: 0.3.2
Summary: Use YAML configuration file to generate HDF5/netCDF4 formated files.
Project-URL: Homepage, https://github.com/rmvanhees/h5_yaml
Project-URL: Source, https://github.com/rmvanhees/h5_yaml
Project-URL: Issues, https://github.com/rmvanhees/h5_yaml/issues
Author-email: Richard van Hees <r.m.van.hees@sron.nl>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: CF metadata,HDF5,YAML,netCDF4
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Python: >=3.10
Requires-Dist: h5py>=3.14
Requires-Dist: netcdf4>=1.7
Requires-Dist: numpy>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=9.0; extra == 'dev'
Provides-Extra: test
Requires-Dist: pytest-cov>=7.0; extra == 'test'
Description-Content-Type: text/markdown

# H5YAML
[![image](https://img.shields.io/pypi/v/h5yaml.svg?label=release)](https://github.com/rmvanhees/h5yaml/)
[![image](https://img.shields.io/pypi/l/h5yaml.svg)](https://github.com/rmvanhees/h5yaml/LICENSE)
[![image](https://img.shields.io/pypi/dm/h5yaml.svg)](https://pypi.org/project/h5yaml/)
[![image](https://img.shields.io/pypi/status/h5yaml.svg?label=status)](https://pypi.org/project/h5yaml/)

## Description
This package let you generate [HDF5](https://docs.h5py.org/en/stable/)/[netCDF4](https://unidata.github.io/netcdf4-python/)
formatted files as defined in a [YAML](https://yaml.org/) configuration file. This has several advantages: 

 * you define the layout of your HDF5/netCDF4 file using YAML which is human-readable and has intuitive syntax.
 * you can reuse the YAML configuration file to to have all your product have a consistent layout.
 * you can make updates by only changing the YAML configuration file
 * you can have the layout of your HDF5/netCDF4 file as a python dictionary, thus without accessing any HDF5/netCDF4 file

The `H5YAML` package has two classes to generate a HDF5/netCDF4 formatted file.

 1. The class `H5Yaml` uses the [h5py](https://pypi.org/project/h5py/) package, which is a Pythonic interface to
    the HDF5 binary data format.
    Let 'h5_def.yaml' be your YAML configuration file then ```H5Yaml("h5_def.yaml").create("foo.h5")``` will create
	the HDF5 file 'foo.h5'. This can be read by netCDF4 software, because it uses dimension-scales to each dataset.
 2. The class `NcYaml` uses the [netCDF4](https://pypi.org/project/netCDF4/) package, which provides an object-oriented
    python interface to the netCDF version 4 library.
    Let 'nc_def.yaml' be your YAML configuration file then ```NcYaml("nc_def.yaml").create("foo.nc")``` will create
	the netCDF4/HDF5 file 'foo.nc'

The class `NcYaml` must be used when strict conformance to the netCDF4 format is required.
However, package `netCDF4` has some limitations, which `h5py` has not, for example it does
not allow variable-length variables to have a compound data-type.

## Installation
The package `h5yaml` is available from PyPI. To install it use `pip`:

> $ pip install [--user] h5yaml

The module `h5yaml` requires Python3.10+ and Python modules: h5py (v3.14+), netCDF4 (v1.7+) and numpy (v2.0+).

**Note**: the packages `h5py` and `netCDF4` come with their own HDF5 libraries. If these are different then they may
collide and result in a *''HDF5 error''*.
If this is the case then you have to install the development packages of HDF5 and netCDF4 (or compile them from source).
And reinstall `h5py` and `netCDF4` using the commands:

> $ pip uninstall h5py; pip install --no-binary=h5py h5py
> $ pip uninstall netCDF4; pip install --no-binary=netCDF4 netCDF4

## Usage

The YAML file should be structured as follows:

 * The top level are: 'groups', 'dimensions', 'compounds', 'variables', 'attrs\_global' and 'attrs\_groups'.
 * > 'attrs\_global' and 'attrs\_groups' are added in version 0.3.0
 * The names of the attributes, groups, dimensions, compounds and viariable should be specified as PosixPaths, however:
   * The names of groups should never start with a slash (always erlative to root);
   * All other elements which are stored in root should also not start with a slash;
   * But these elements require a starting slash (absolute paths) when they are stored not the root. 
 * The section 'groups' are optional, but you should provide each group you want to use
   in your file. The 'groups' section in the YAML file may look like this:
   ```
   groups:
     - engineering_data
     - image_attributes
     - navigation_data
     - science_data
     - processing_control/input_data
   ```

 * The section 'dimensions' is obligatory, you should define the dimensions for each
   variable in your file. The 'dimensions' section may look like this:

   ```
   dimensions:
     days:
       _dtype: u4
       _size: 0
       long_name: days since 2024-01-01 00:00:00Z
     number_of_images:             # an unlimited dimension
       _dtype: u2
       _size: 0
     samples_per_image:            # a fixed dimension
       _dtype: u4
       _size: 307200
     /navigation_data/att_time:    # an unlimited dimension in a group with attributes
       _dtype: f8
       _size: 0
       _FillValue: -32767
       long_name: Attitude sample time (seconds of day)
       calendar: proleptic_gregorian
       units: seconds since %Y-%m-%d %H:%M:%S
       valid_min: 0
       valid_max: 92400
     n_viewport:                   # a fixed dimension with fixed values and attributes
       _dtype: i2
       _size: 5
       _values: [-50, -20, 0, 20, 50]
       long_name: along-track view angles at sensor
       units: degrees
   ```

 * The 'compounds' are optional, but you should provide each compound data-type which
   you want to use in your file. For each compound element you have to provide its
   data-type and attributes: units and long_name. The 'compound' section may look like
   this:

   ```
   compounds:
     stats_dtype:
       time: [u8, seconds since 1970-01-01T00:00:00, timestamp]
       index: [u2, '1', index]
       tbl_id: [u1, '1', binning id]
       saa: [u1, '1', saa-flag]
       coad: [u1, '1', co-addings]
       texp: [f4, ms, exposure time]
       lat: [f4, degree, latitude]
       lon: [f4, degree, longitude]
       avg: [f4, '1', '$S - S_{ref}$']
       unc: [f4, '1', '\u03c3($S - S_{ref}$)']
       dark_offs: [f4, '1', dark-offset]
   ```

 * The 'variables' are defined by their data-type ('_dtype') and dimensions ('_dims'),
   and optionally chunk sizes ('_chunks'), compression ('_compression'), variable length
   ('_vlen'). In addition, each variable can have as many attributes as you like,
   defined by its name and value. The 'variables' section may look like this:

   ```
   variables:
     /science_data/detector_images:
       _dtype: u2
       _dims: [number_of_images, samples_per_image]
	   _compression: 3
       _FillValue: 65535
       long_name: Detector pixel values
       coverage_content_type: image
       units: '1'
       valid_min: 0
       valid_max: 65534
     /image_attributes/nr_coadditions:
       _dtype: u2
       _dims: [number_of_images]
       _FillValue: 0
       long_name: Number of coadditions
       units: '1'
       valid_min: 1
     /image_attributes/exposure_time:
       _dtype: f8
       _dims: [number_of_images]
       _FillValue: -32767
       long_name: Exposure time
       units: seconds
     stats_163:
       _dtype: stats_dtype
       _dims: [days]
       _vlen: True
       comment: detector map statistics (MPS=163)
   ```

### Notes and ToDo

* The layout of a HDF5 or netCDF4 file can be complex. From version 0.3.0, you can split the file definition over several YAML files and provide a list with the names of YAML files as input to H5Yaml and NcYaml.  

## Support [TBW]

## Road map

 * Release v0.1 : stable API to read your YAML files and generate the HDF5/netCDF4 file


## Authors and acknowledgment
The code is developed by R.M. van Hees (SRON)

## License

* Copyright: Richard van Hees (SRON) (https://www.sron.nl).
* License: Apache-2.0
