Metadata-Version: 2.4
Name: chunkipy
Version: 1.2.0
Summary: Chunkipy is an easy-to-use library for chunking text based on the size estimator function you provide.
Author-email: Gioele Crispo <crispogioele@gmail.com>
License-Expression: MIT
Project-URL: Repository, https://github.com/gioelecrispo/chunkipy
Keywords: text,chunking,NLP,tokenization
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.14,>=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: typing-extensions>=4.14.1
Provides-Extra: langdetect
Requires-Dist: langdetect>=1.0.9; extra == "langdetect"
Provides-Extra: fasttext
Requires-Dist: fasttext>=0.9.3; extra == "fasttext"
Provides-Extra: spacy
Requires-Dist: spacy>=3.7.0; extra == "spacy"
Provides-Extra: stanza
Requires-Dist: stanza>=1.4.2; extra == "stanza"
Provides-Extra: openai
Requires-Dist: openai>=1.88.0; extra == "openai"
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.9.0; extra == "tiktoken"
Provides-Extra: language-detection
Requires-Dist: langdetect>=1.0.9; extra == "language-detection"
Requires-Dist: fasttext>=0.9.3; extra == "language-detection"
Provides-Extra: nlp
Requires-Dist: spacy>=3.7.0; extra == "nlp"
Requires-Dist: stanza>=1.4.2; extra == "nlp"
Provides-Extra: ai
Requires-Dist: openai>=1.88.0; extra == "ai"
Requires-Dist: tiktoken>=0.9.0; extra == "ai"
Provides-Extra: all
Requires-Dist: langdetect>=1.0.9; extra == "all"
Requires-Dist: fasttext>=0.9.3; extra == "all"
Requires-Dist: spacy>=3.7.0; extra == "all"
Requires-Dist: stanza>=1.4.2; extra == "all"
Requires-Dist: openai>=1.88.0; extra == "all"
Requires-Dist: tiktoken>=0.9.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=6.2.1; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: black>=26.3.1; extra == "dev"
Requires-Dist: sphinx==8.1.3; extra == "dev"
Requires-Dist: sphinx-autodoc-typehints==3.0.1; extra == "dev"
Requires-Dist: sphinx-multiversion>=0.2.4; extra == "dev"
Requires-Dist: sphinx-rtd-theme==3.0.2; extra == "dev"
Requires-Dist: sphinx-inline-tabs==2023.4.21; extra == "dev"
Dynamic: license-file

# Chunkipy

[![Python 3.10–3.13](https://img.shields.io/badge/python-3.10%20|%203.11%20|%203.12%20|%203.13-blue.svg)](#)
[![PyPI version](https://badge.fury.io/py/chunkipy.svg)](https://badge.fury.io/py/chunkipy)
[![codecov](https://codecov.io/gh/gioelecrispo/chunkipy/graph/badge.svg?token=2A7KQ87Q62)](https://codecov.io/gh/gioelecrispo/chunkipy)
[![Docs](https://img.shields.io/badge/docs-online-success)](https://gioelecrispo.github.io/chunkipy/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

`chunkipy` is a modular and extensible text chunking library for Python, built for NLP and LLM pipelines.

## Why Chunkipy?

- ✅ Lightweight core with optional extras
- ✅ Configurable overlap support via `overlap_ratio`
- ✅ Composable architecture (chunkers + splitters + size estimators + language detectors)
- ✅ Practical defaults with customizable behavior

## Quick Example

```python
from chunkipy import FixedSizeTextChunker

text = "Chunkipy makes text processing modular, flexible, and powerful!"
chunker = FixedSizeTextChunker(chunk_size=20, overlap_ratio=0.2)
chunks = chunker.chunk(text)

for i, c in enumerate(chunks):
    print(f"Chunk {i + 1}: {c}")
```

## Implemented vs Roadmap

| Status | Strategy |
| --- | --- |
| ✅ Implemented | `FixedSizeTextChunker` |
| ✅ Implemented | `RecursiveTextChunker` |
| 🚧 Roadmap | Document-based chunking |
| 🚧 Roadmap | Semantic chunker |
| 🚧 Roadmap | LLM-based chunker |

> Semantic sentence splitters and language detectors are already available and can be used today.

## Installation

Install core package:

```bash
pip install chunkipy
```

Install optional feature groups:

```bash
pip install "chunkipy[language-detection]"  # Language detection (langdetect + fasttext)
pip install "chunkipy[nlp]"                  # NLP backends (spacy + stanza)
pip install "chunkipy[ai]"                   # LLM integration (openai + tiktoken)
pip install "chunkipy[all]"                  # All optional dependencies
```

Or install individual packages:

```bash
pip install "chunkipy[spacy]"
pip install "chunkipy[stanza]"
pip install "chunkipy[langdetect]"
pip install "chunkipy[fasttext]"
pip install "chunkipy[openai]"
pip install "chunkipy[tiktoken]"
```

## Documentation

Full guides and API reference:
👉 <https://gioelecrispo.github.io/chunkipy>

Examples:
👉 <https://github.com/gioelecrispo/chunkipy/tree/main/examples>

## Contributing

Issues and pull requests are welcome:
👉 <https://github.com/gioelecrispo/chunkipy/issues>

For local setup, see `CONTRIBUTING.md`.

## License

`chunkipy` is released under the [MIT License](https://opensource.org/license/MIT).
