Metadata-Version: 2.1
Name: chunkyp
Version: 0.0.2
Summary: Ray-based preprocesisng pipeline.
Home-page: https://github.com/neophocion/chunkyp
Author: Neo Phocion
Author-email: neophocion@protonmail.com
License: apache-2.0
Download-URL: https://github.com/neophocion/chunkyp/releases
Project-URL: Repo, https://github.com/neophocion/chunkyp
Keywords: ray,preprocessing,nlp,cleaning,workflow
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Topic :: Text Processing
Classifier: Topic :: Scientific/Engineering
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.5, <4
Description-Content-Type: text/markdown
Requires-Dist: ray (>=0.8.6)
Requires-Dist: psutil (>=5.7.0)
Provides-Extra: dev
Requires-Dist: pytest ; extra == 'dev'
Provides-Extra: test

# chunkyp

A small and concise data preprocessing library inspired by common NLP preprocessing workflows. 

Supports [ray](https://github.com/ray-project/ray).

## Installation
chunkyp is available on PyPi.
```bash
pip install chunkyp
```

For the dev version you can run the following.
```bash
git clone https://github.com/neophocion/chunkyp
cd chunkyp
pip install -e .
```

## Usage

The simplest way to get started is to look at the Jupyter notebooks in [`notebooks/`](https://github.com/neophocion/chunkyp/tree/master/notebooks)

A small example:

```python
from chunkyp import 

res = pipe(
    records, # a list, or iterator across, dicts
    p('field', lambda x: x.lower()),
    p('field', lambda x: x.upper(), 'new_field'),
    p(['field1', 'field2'], lambda x,y: len(x.split()) == y, 'new_field2'),
)

res = list(res)
res
```


