Metadata-Version: 2.1
Name: zcollection
Version: 2011.11.1
Summary: Zarr Collection
Home-page: https://github.com/CNES/zcollection
Author: CNES/CLS
Author-email: fbriol@gmail.com
License: BSD License
Keywords: zarr,collection,xarray,dask
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Natural Language :: English
Classifier: Operating System :: MacOS
Classifier: Operating System :: Microsoft :: Windows
Classifier: Operating System :: POSIX
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Physics
Requires-Python: >=3.8
Description-Content-Type: text/x-rst
License-File: LICENSE
Requires-Dist: dask>=2022.8.0
Requires-Dist: distributed
Requires-Dist: fasteners
Requires-Dist: fsspec
Requires-Dist: numcodecs
Requires-Dist: numpy>=1.20
Requires-Dist: pandas
Requires-Dist: xarray
Requires-Dist: zarr>=2.11
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"

ZCollection
===========

This project is a Python library allowing manipulating data partitioned into a
**collection** of `Zarr <https://zarr.readthedocs.io/en/stable/>`_ groups.

This collection allows dividing a dataset into several partitions to facilitate
acquisitions or updates made from new products. Possible data partitioning is:
by **date** (hour, day, month, etc.) or by **sequence**.

A collection partitioned by date, with a monthly resolution, may look like on
the disk:

.. code-block:: text

    collection/
    ├── year=2022
    │    ├── month=01/
    │    │    ├── time/
    │    │    │    ├── 0.0
    │    │    │    ├── .zarray
    │    │    │    └── .zattrs
    │    │    ├── var1/
    │    │    │    ├── 0.0
    │    │    │    ├── .zarray
    │    │    │    └── .zattrs
    │    │    ├── .zattrs
    │    │    ├── .zgroup
    │    │    └── .zmetadata
    │    └── month=02/
    │         ├── time/
    │         │    ├── 0.0
    │         │    ├── .zarray
    │         │    └── .zattrs
    │         ├── var1/
    │         │    ├── 0.0
    │         │    ├── .zarray
    │         │    └── .zattrs
    │         ├── .zattrs
    │         ├── .zgroup
    │         └── .zmetadata
    └── .zcollection

Partition updates can be set to overwrite existing data with new ones or to
update them using different **strategies**.

The `Dask library <https://dask.org/>`_ handles the data to scale the treatments
quickly.

It is possible to create views on a reference collection, to add and modify
variables contained in a reference collection, accessible in reading only.

This library can store data on POSIX, S3, or any other file system supported by
the Python library `fsspec
<https://filesystem-spec.readthedocs.io/en/latest/>`_. Note, however, only POSIX
and S3 file systems have been tested.
