Metadata-Version: 2.4
Name: jajula-chunking
Version: 0.1.0
Summary: A comprehensive text chunking library for RAG applications with multiple strategies
Author-email: Jajula <contact@jajula.com>
License: MIT
Project-URL: Homepage, https://github.com/jajula/jajula-chunking
Project-URL: Repository, https://github.com/jajula/jajula-chunking
Project-URL: Documentation, https://jajula-chunking.readthedocs.io
Project-URL: Issues, https://github.com/jajula/jajula-chunking/issues
Keywords: chunking,rag,nlp,text-processing,ai,machine-learning
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nltk>=3.8
Requires-Dist: beautifulsoup4>=4.11.0
Requires-Dist: numpy>=1.21.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: tiktoken>=0.4.0
Requires-Dist: transformers>=4.21.0
Requires-Dist: torch>=1.12.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=22.0.0; extra == "dev"
Requires-Dist: flake8>=5.0.0; extra == "dev"
Requires-Dist: mypy>=0.991; extra == "dev"
Requires-Dist: isort>=5.10.0; extra == "dev"
Requires-Dist: pre-commit>=2.20.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=5.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.0.0; extra == "docs"
Requires-Dist: myst-parser>=0.18.0; extra == "docs"
Dynamic: license-file

# Jajula Chunking

A comprehensive Python library for text chunking strategies optimized for RAG (Retrieval-Augmented Generation) applications.

## Features

- **9 Different Chunking Strategies** - From simple fixed-size to advanced semantic chunking
- **RAG-Optimized** - Designed specifically for retrieval-augmented generation workflows
- **Easy to Use** - Simple, consistent API across all chunkers
- **Extensible** - Base classes for creating custom chunking strategies
- **Production Ready** - Comprehensive error handling and validation

## Installation

```bash
pip install jajula-chunking
```

## Quick Start

```python
from jajula_chunking import FixedSizeChunker, SemanticChunker

# Fixed-size chunking
chunker = FixedSizeChunker(chunk_size=500, overlap=50)
chunks = chunker.chunk("Your long text here...")

# Semantic chunking
semantic_chunker = SemanticChunker(similarity_threshold=0.6)
semantic_chunks = semantic_chunker.chunk("Your text here...")

for chunk in chunks:
    print(f"ID: {chunk.chunk_id}")
    print(f"Content: {chunk.content}")
    print(f"Metadata: {chunk.metadata}")
    print("---")
```

## Available Chunkers

1. **FixedSizeChunker** - Fixed character/word-based chunking
2. **SentenceBasedChunker** - Sentence boundary-based chunking
3. **ParagraphBasedChunker** - Paragraph boundary-based chunking
4. **SemanticChunker** - AI-powered semantic chunking
5. **HierarchicalChunker** - Multi-level hierarchical chunking
6. **StructureBasedChunker** - Document structure-based chunking (HTML/Markdown)
7. **TokenBasedChunker** - Token-count based chunking
8. **RecursiveChunker** - Recursive text splitting with multiple separators
9. **AdaptiveChunker** - Adaptive chunking based on content analysis

## Documentation

For detailed documentation and examples, visit: [Documentation Link]

## Contributing

Contributions are welcome! Please read our contributing guidelines for details.

## License

MIT License - see LICENSE file for details.
