Metadata-Version: 2.2
Name: chunking4rag
Version: 0.0.4
Summary: A small library to chunk large files into smaller arrays that can be used for generating RAG embeddings
Home-page: https://github.com/harpreetset1/chunking4rag
Author: Harpreet Sethi
Author-email: Harpreet Sethi <harpreetset@gmail.com>
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: gensim>=4.3.3
Requires-Dist: html2text>=2024.2.26
Requires-Dist: nltk>=3.9.1
Requires-Dist: pydantic>=2.10.6
Requires-Dist: pypdf2>=3.0.1
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Provides-Extra: formatting
Requires-Dist: black; extra == "formatting"
Requires-Dist: flake8; extra == "formatting"
Requires-Dist: isort; extra == "formatting"
Provides-Extra: type-checking
Requires-Dist: mypy; extra == "type-checking"
Provides-Extra: docs
Requires-Dist: pydocstyle; extra == "docs"
Provides-Extra: publishing
Requires-Dist: twine; extra == "publishing"
Requires-Dist: wheel; extra == "publishing"

# chunking4rag
This repo will have various chunking strategies one can build in order to get best performance out of RAG framework
The strategies discussed in this repo are:
1. [Fixed length chunking](./chunkingmethods/fixed_length_chunking.py)
  
2. [Keyword chunking](./chunkingmethods/keyword_chunking.py)
  
3. [Adaptive chunking](./chunkingmethods/adaptive_chunking.py)
  
4. [Sliding window](./chunkingmethods/sliding_window_chunking.py)
    
5. [Paragraph chunking](./chunkingmethods/paragraph_chunking.py)
  
6. [Sentence chunking](./chunkingmethods/sentence_chunking.py)
  
# To install this library
Run the following command
```
pip install chunking4rag
```

# To start with contribution to the project
1. Clone the repository using git
  
2. Create a virtual environment using uv
  ```
  uv create chunking4rag
  ```
3. Activate the virtual environment
  ```
  source .venv/bin/activate
  ```
4. Install the dependencies by running
  ```
  uv install -r requirements.txt
  ```
5. Run tests to make sure everything is working fine
  ```
  python chuking_tests.py
  ```
