Metadata-Version: 2.4
Name: pk-ai-tools
Version: 0.2.2
Summary: Simple RAG pipeline + document ingestion utilities (LangChain + Chroma + Ollama).
Project-URL: Homepage, https://github.com/paulkv905/pk-ai-tools
Project-URL: Issues, https://github.com/paulkv905/pk-ai-tools/issues
Author: Paul Kviding
License: MIT
License-File: LICENSE
Keywords: chroma,embeddings,langchain,ollama,rag,retrieval
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: <3.15,>=3.10
Requires-Dist: chromadb<2,>=1.4
Requires-Dist: langchain-chroma<2,>=1.1
Requires-Dist: langchain-community<1,>=0.4
Requires-Dist: langchain-core<2,>=1.2
Requires-Dist: langchain-ollama<2,>=1.0
Requires-Dist: langchain<2,>=1.2
Requires-Dist: ollama<1,>=0.6
Requires-Dist: pandas<3,>=2.0
Requires-Dist: requests<3,>=2.31
Requires-Dist: zstandard>=0.25.0
Description-Content-Type: text/markdown

# pk-ai-tools

`pk-ai-tools` is a small Python library that provides a reusable **RAG (Retrieval-Augmented Generation) pipeline** and a flexible **document ingestion system**.

It is built on top of:
- **LangChain**
- **Chroma**
- **Ollama**

The main goal of this project is to make it easy to:
- Ingest documents from many formats (PDF, Word, Excel, CSV, Markdown, HTML, etc.)
- Build or update a Chroma vector database
- Ask questions using a simple RAG pipeline backed by **local Ollama models**

This library was originally created for personal use and has since been generalized to be reusable across projects.

---

## Installation

```bash
pip install pk-ai-tools
Requirements
Python 3.10 – 3.13

Python 3.14+ is currently not supported due to upstream dependencies (Chroma / onnxruntime)

Ollama installed and running

https://ollama.com

At least one local LLM downloaded (for example: llama3)

Quick example
python
Kopiera kod
from pk_ai_tools import RAGPipeline

rag = RAGPipeline(
    doc_folder="./data",
    language="en",
    uuid="demo-user",
    model_name="llama3"
)

answer = rag.ask("What is this documentation about?")
print(answer)
How it works (high level)
Documents in doc_folder are ingested and split into chunks

Chunks are embedded and stored in a Chroma vector database

Queries are expanded and retrieved using LangChain

Answers are generated by a local Ollama model

Supported document types
The document ingestion system supports:

PDF

Word (.docx)

Excel (.xlsx)

CSV

Markdown

HTML

Plain text

Notes
This library is designed for local, private RAG setups

No cloud APIs are required

Vector databases are stored locally

Ollama must be running before querying

Roadmap (informal)
Improve Python 3.14+ compatibility when dependencies allow

Optional dependency groups (lighter installs)

Better configuration presets

More examples and docs