Metadata-Version: 2.3
Name: polario
Version: 0.3.2
Summary: Polars IO
Project-URL: homepage, https://bneijt.github.io/polario/
Project-URL: repository, https://github.com/bneijt/polario
Author-email: Bram Neijt <bram@neijt.nl>
License: Apache-2.0
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.9
Requires-Dist: fsspec
Requires-Dist: polars[fsspec]>=0.16
Requires-Dist: pyarrow>11
Description-Content-Type: text/markdown

Polars IO utility library
=================

Helpers to make it easier to read and write Hive partitioned parquet dataset with Polars.

It is meant to be a library to deal with datasets easily, but also contains a commandline interface
which allows you to inspect parquet files and datasets more easily.

Dataset
=======
Example of use of `polario.hive_dataset.HiveDataset`
```python

from polario.hive_dataset import HiveDataset
import polars as pl
df = pl.from_dicts(
        [
            {"p1": 1, "v": 1},
            {"p1": 2, "v": 1},
        ]
    )

ds = HiveDataset("file:///tmp/", partition_columns=["p1"])

ds.write(df)

for partition_df in ds.read_partitions():
    print(partition_df)

```


To model data storage, we use three layers: dataset, partition, fragment.

- Each dataset is a lexical ordered set of partitions
- Each partition is a lexical ordered set of fragments
- Each fragment is a file on disk with rows in any order
