Metadata-Version: 2.1
Name: tsum
Version: 0.1.0
Summary: Summarize data in Dask DataFrames.
Author: Fasih Khatib
Author-email: hellofasih.confound928@passinbox.com
Requires-Python: >=3.9,<4.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Dist: attrs (>=23.2.0,<24.0.0)
Requires-Dist: cattrs (>=23.2.3,<24.0.0)
Requires-Dist: dask[dataframe,distributed] (>=2024.4.1,<2025.0.0)
Requires-Dist: frozendict (>=2.4.1,<3.0.0)
Description-Content-Type: text/markdown

## TSum - Table Summarization

> Given a table where rows correspond to records and columns correspond to attributes, we want to find a small number of patterns that succinctly summarize the dataset. 

TSum is a [table summarization algorithm published by Google Research.](https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41683.pdf) This is a Python implementation of the algorithm using Dask Dataframes for scale.  

### Usage

```python
import dask.dataframe as dd
from tsum import summarize, Pattern
from dask.distributed import LocalCluster

cluster = LocalCluster(n_workers=1, nthreads=8, diagnostics_port=8787)
client = cluster.get_client()
ddf: dd.DataFrame = ...
patterns: list[Pattern] = summarize(ddf=ddf)
```

