Metadata-Version: 2.3
Name: chunking_experiment
Version: 0.1.0
Summary: A package for experimenting with different data chunking strategies
Project-URL: Homepage, https://github.com/JohnnyTeutonic/chunking_experiment
Project-URL: Bug Tracker, https://github.com/JohnnyTeutonic/chunking_experiment/issues
Author-email: Jonathan Reich <jonathanreich100@gmail.com>
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: gradio>=3.0.0
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.5.0
Requires-Dist: pyarrow>=7.0.0
Requires-Dist: pytest>=7.0.0
Provides-Extra: dev
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx-autobuild>=2021.3.14; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'docs'
Requires-Dist: sphinx>=4.0; extra == 'docs'
Description-Content-Type: text/markdown

# Chunking Experiment

[![codecov](https://codecov.io/gh/JohnnyTeutonic/ChunkingForPandas/branch/main/graph/badge.svg)](https://codecov.io/gh/JohnnyTeutonic/chunking_experiment)
[![Tests](https://github.com/JohnnyTeutonic/ChunkingForPandas/actions/workflows/test.yml/badge.svg)](https://github.com/JohnnyTeutonic/ChunkingForPandas/actions/workflows/test.yml)

A Python package to chunk pandas/numpy data with different chunking strategies.

## Requirements

- Python 3.10+
- Gradio
- pandas
- pytest
- numpy

## Installation

```bash
pip install chunking-experiment
```

Or:

```bash
make install
```

## Usage

```python
from chunking_experiment import ChunkingExperiment
```

## Create an instance of a Chunking class

```python
class_instance = ChunkingExperiment(
"input.csv",
"output.csv",
n_chunks=3,
chunking_strategy="rows"
)
```

Then perform the chunking:

```python
class_instance.process_chunks()
```

## Run the web interface

Go to the app folder and run the following command:

```python
from gradio_interface import launch_interface
launch_interface()
```

Alternatively, you can run the following command to start the web interface:

```bash
python gradio_interface.py
```

## Features

- Multiple chunking strategies (rows, columns, tokens)
- Support for CSV, JSON, Numpy and Parquet files
- Web interface using Gradio
- Comprehensive test suite
- Documentation using Sphinx
- Benchmarking the chunking strategies

## Development

To install development dependencies:

```bash
pip install -e .[dev]
```

Or:

```bash
make install-dev
```

## Testing

To run tests, run the following command from the root folder:

```bash
pytest
```

## Documentation

To install the documentation dependencies:

```bash
pip install -e .[docs]
```

Or:

```bash
make install-docs
```

To build the documentation:

```bash
make docs
```

To serve the documentation:

```bash
make docs-serve
```

Below are the full list of commands that can be run from the root folder:

```bash
make benchmark
make clean
make docs
make docs-clean
make docs-serve
make install
make install-docs
make lint
make run
make test
make typecheck
```

## License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
