Metadata-Version: 2.4
Name: ragit
Version: 0.8.1
Summary: Automatic RAG Pattern Optimization Engine
Author: RODMENA LIMITED
Maintainer-email: RODMENA LIMITED <info@rodmena.co.uk>
License-Expression: Apache-2.0
Project-URL: Homepage, https://github.com/rodmena-limited/ragit
Project-URL: Repository, https://github.com/rodmena-limited/ragit
Project-URL: Issues, https://github.com/rodmena-limited/ragit/issues
Keywords: AI,RAG,LLM,GenAI,Optimization,Ollama
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Operating System :: MacOS :: MacOS X
Classifier: Operating System :: POSIX :: Linux
Requires-Python: <3.14,>=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: pandas>=2.2.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: trio>=0.24.0
Requires-Dist: httpx>=0.27.0
Provides-Extra: dev
Requires-Dist: ragit[test]; extra == "dev"
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: issuedb[web]; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pytest-mock; extra == "test"
Provides-Extra: transformers
Requires-Dist: sentence-transformers>=2.2.0; extra == "transformers"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=2.0; extra == "docs"
Requires-Dist: sphinx-copybutton>=0.5; extra == "docs"
Dynamic: license-file

# ragit

RAG toolkit for Python. Document loading, chunking, vector search, LLM integration.

## Installation

```bash
pip install ragit

# For offline embedding
pip install ragit[transformers]
```

## Quick Start

You must provide an embedding source: custom function, SentenceTransformers, or any provider.

### Custom Embedding Function

```python
from ragit import RAGAssistant

def my_embed(text: str) -> list[float]:
    # Use any embedding API: OpenAI, Cohere, HuggingFace, etc.
    return embedding_vector

assistant = RAGAssistant("docs/", embed_fn=my_embed)
results = assistant.retrieve("search query")
```

### With LLM for Q&A

```python
def my_embed(text: str) -> list[float]:
    return embedding_vector

def my_generate(prompt: str, system_prompt: str = "") -> str:
    return llm_response

assistant = RAGAssistant("docs/", embed_fn=my_embed, generate_fn=my_generate)
answer = assistant.ask("How does authentication work?")
```

### Offline Embedding (SentenceTransformers)

Models are downloaded automatically on first use (~90MB for default model).

```python
from ragit import RAGAssistant
from ragit.providers import SentenceTransformersProvider

# Uses all-MiniLM-L6-v2 by default
assistant = RAGAssistant("docs/", provider=SentenceTransformersProvider())

# Or specify a model
assistant = RAGAssistant(
    "docs/",
    provider=SentenceTransformersProvider(model_name="all-mpnet-base-v2")
)
```

Available models: `all-MiniLM-L6-v2` (384d), `all-mpnet-base-v2` (768d), `paraphrase-MiniLM-L6-v2` (384d)

## Core API

```python
assistant = RAGAssistant(
    documents,           # Path, list of Documents, or list of Chunks
    embed_fn=...,        # Embedding function: (str) -> list[float]
    generate_fn=...,     # LLM function: (prompt, system_prompt) -> str
    provider=...,        # Or use a provider instead of functions
    chunk_size=512,
    chunk_overlap=50
)

results = assistant.retrieve(query, top_k=3)      # [(Chunk, score), ...]
context = assistant.get_context(query, top_k=3)   # Formatted string
answer = assistant.ask(question, top_k=3)         # Requires generate_fn/LLM
code = assistant.generate_code(request)           # Requires generate_fn/LLM
```

## Document Loading

```python
from ragit import load_text, load_directory, chunk_text

doc = load_text("file.md")
docs = load_directory("docs/", "*.md")
chunks = chunk_text(text, chunk_size=512, chunk_overlap=50, doc_id="id")
```

## Hyperparameter Optimization

```python
from ragit import RagitExperiment, Document, BenchmarkQuestion

def my_embed(text: str) -> list[float]:
    return embedding_vector

def my_generate(prompt: str, system_prompt: str = "") -> str:
    return llm_response

docs = [Document(id="1", content="...")]
benchmark = [BenchmarkQuestion(question="...", ground_truth="...")]

experiment = RagitExperiment(
    docs, benchmark,
    embed_fn=my_embed,
    generate_fn=my_generate
)
results = experiment.run(max_configs=20)
print(results[0])  # Best config
```

## License

Apache-2.0 - RODMENA LIMITED
