Metadata-Version: 2.4
Name: animuz-core
Version: 0.1.4
Summary: Core shared utilities for Animuz RAG system - LLM clients, pipelines, vector DB, and document ingestion
Author-email: Animuz Team <dev@animuz.com>
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: httpx>=0.25.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: requests>=2.31.0
Requires-Dist: langchain-text-splitters>=0.0.1
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.39.0; extra == "anthropic"
Provides-Extra: ollama
Requires-Dist: ollama>=0.1.0; extra == "ollama"
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.7.0; extra == "qdrant"
Provides-Extra: aws
Requires-Dist: boto3>=1.28.0; extra == "aws"
Requires-Dist: aiobotocore>=2.7.0; extra == "aws"
Requires-Dist: watchtower>=3.0.0; extra == "aws"
Requires-Dist: sagemaker>=2.200.0; extra == "aws"
Provides-Extra: azure
Requires-Dist: azure-ai-documentintelligence>=1.0.0b2; extra == "azure"
Provides-Extra: ingest
Requires-Dist: unstructured-client>=0.11.0; extra == "ingest"
Requires-Dist: PyMuPDF>=1.23.0; extra == "ingest"
Provides-Extra: fastapi
Requires-Dist: fastapi>=0.104.0; extra == "fastapi"
Provides-Extra: all
Requires-Dist: animuz-core[anthropic,aws,azure,fastapi,ingest,ollama,openai,qdrant]; extra == "all"
Provides-Extra: dev
Requires-Dist: animuz-core[all]; extra == "dev"
Requires-Dist: pytest>=7.4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Dynamic: license-file

# animuz-core

Core shared utilities for Animuz RAG (Retrieval-Augmented Generation) system.

## Features

- **LLM Clients**: OpenAI, Anthropic Claude, Ollama
- **RAG Pipelines**: Simple and Agentic RAG implementations
- **Vector Database**: Qdrant integration with hybrid search (dense + sparse)
- **Embedding Clients**: Multiple providers (local server, Modal, S3/SageMaker)
- **Document Ingestion**: Azure Document Intelligence, Unstructured, PDF extraction, structured text parsing
- **CloudWatch Logging**: Structured JSON logging with watchtower

## Requirements

- Python >= 3.10

## Installation

Install the core package (minimal dependencies only):

```bash
pip install animuz-core
```

Then install only the extras you need:

```bash
# Single extra
pip install animuz-core[openai]

# Multiple extras
pip install animuz-core[openai,qdrant,aws]

# Everything
pip install animuz-core[all]
```

Works with uv too:

```bash
uv pip install animuz-core[openai,qdrant]
```

### Available Extras

| Extra       | What it installs                                  | Use when you need                                   |
| ----------- | ------------------------------------------------- | --------------------------------------------------- |
| `openai`    | `openai`                                          | OpenAI GPT models                                   |
| `anthropic` | `anthropic`                                       | Anthropic Claude models                             |
| `ollama`    | `ollama`                                          | Local LLMs via Ollama                               |
| `qdrant`    | `qdrant-client`                                   | Qdrant vector database                              |
| `aws`       | `boto3`, `aiobotocore`, `watchtower`, `sagemaker` | S3, SageMaker embeddings, CloudWatch logging        |
| `azure`     | `azure-ai-documentintelligence`                   | Azure Document Intelligence for PDF ingestion       |
| `ingest`    | `unstructured-client`, `PyMuPDF`                  | Document parsing (Unstructured API, PDF extraction) |
| `fastapi`   | `fastapi`                                         | Streaming SSE endpoints                             |
| `all`       | All of the above                                  | Everything                                          |
| `dev`       | `all` + `pytest`, `black`, `ruff`, `mypy`         | Development and testing                             |

## Usage

### Unified RAG API

```python
from animuz_core import RAG, RAGConfig, tool

@tool(description="Lookup a user by ID")
def lookup_user(user_id: str) -> str:
    return f"User {user_id}"

config = RAGConfig.from_env().with_defaults()
rag = RAG(config=config, tools=[lookup_user])

await rag.add_doc("docs/intro.md", user_chat_id="demo")
response = await rag.chat("What is this project?", user_chat_id="demo")
```

Example script:

```bash
python scripts/quickstart_rag.py
```

### LLM Clients

```python
from animuz_core.genai import OpenAIAgentClient, AnthropicAgentClient, OLlamaClient

# OpenAI agent with tool use
agent = OpenAIAgentClient(tools=my_tools)
response = await agent.chat(messages, model="gpt-4o")

# Anthropic agent with tool use
agent = AnthropicAgentClient(tools=my_tools)
response = await agent.chat(messages, model="claude-sonnet-4-20250514")

# Ollama (local)
client = OLlamaClient()
response = await client.chat(messages, model="llama3")
```

### RAG Pipelines

```python
from animuz_core.pipelines import AgenticRAG, SimpleRAG

# Agentic RAG - LLM decides when to call the retriever
pipeline = AgenticRAG(
    llm=agent,
    embedding_client=embedding_client,
    qdrant_client=qdrant_client,
)
result = await pipeline.run(query="What is RAG?", user_chat_id="tenant-123")

# Simple RAG - always retrieves then generates
pipeline = SimpleRAG(
    llm=client,
    embedding_client=embedding_client,
    qdrant_client=qdrant_client,
)
result = await pipeline.run(query="What is RAG?", user_chat_id="tenant-123")
```

### Vector Database

```python
from animuz_core.vectordb import QdrantDBClient

client = QdrantDBClient()
# Hybrid search with multi-tenant isolation
results = await client.search(
    dense_vector=dense_vec,
    sparse_vector=sparse_vec,
    user_chat_id="tenant-123",
    limit=5,
)
```

### Embedding

```python
from animuz_core.embedding import EmbeddingClient, ModalEmbeddingClient, S3EmbeddingClient

# Local embedding server
client = EmbeddingClient()
dense, sparse = await client.embed("Some text to embed")

# Modal-hosted embeddings
client = ModalEmbeddingClient()

# S3/SageMaker embeddings
client = S3EmbeddingClient()
```

### Document Ingestion

```python
from animuz_core.ingest import AzureDocAiClient, MyUnstructuredClient, Structured

# Azure Document Intelligence (PDFs)
azure_client = AzureDocAiClient()
text = await azure_client.extract("document.pdf")

# Unstructured API
unstructured = MyUnstructuredClient()
chunks = await unstructured.ingest("document.docx")

# Structured text (txt, md, csv)
structured = Structured()
chunks = structured.split("document.txt")
```

### Top-level Imports

All main classes are re-exported from the package root:

```python
from animuz_core import (
    RAG, RAGConfig, tool,
    AgenticRAG, SimpleRAG,
    OpenAIAgentClient, OpenAILLMClient, AnthropicClient, AnthropicAgentClient, OLlamaClient,
    QdrantDBClient,
    EmbeddingClient,
    AzureDocAiClient, MyUnstructuredClient, Structured,
)
```

## Development

```bash
# Clone and install in editable mode with dev dependencies
git clone <repo-url>
cd animuz-core
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run integration tests (requires external services + env vars)
pytest -m integration tests/integration/
pytest -m integration tests/integration/test_e2e_rag_wrapper_simple.py

# Format
black src/
ruff check src/
```

## Publishing to PyPI

1. Bump the version in `pyproject.toml` and `__init__.py`.

2. Build the package:

```bash
uv pip install --upgrade build # python -m pip install --upgrade build
uv run python -m build # python -m build
```

3. (Optional) Verify the artifacts:

```bash
uv pip install --upgrade twine # python -m pip install --upgrade twine
uv run python -m twine check dist/* # python -m twine check dist/*
```

4. Upload to TestPyPI first:

```bash
uv run python -m twine upload -r testpypi dist/* # python -m twine upload -r testpypi dist/*
```

5. Upload to PyPI:

```bash
uv run python -m twine upload dist/* # python -m twine upload dist/*
```

Notes:

- Create a PyPI API token and set `TWINE_USERNAME=__token__` and `TWINE_PASSWORD=<your-token>`.
- If you upload to TestPyPI, install with `pip install -i https://test.pypi.org/simple animuz-core` to verify.

### Integration Test Setup (Qdrant)

Use Docker Compose to run Qdrant locally:

```bash
docker compose -f docker-compose-qdrant.yml up -d qdrant
```

Then set the Qdrant env vars (example):

```bash
export QDRANT_HOST=localhost
export QDRANT_PORT=6333
```

## Environment Variables

The package reads configuration from environment variables (loaded via `python-dotenv`):

| Variable                                               | Used by                     |
| ------------------------------------------------------ | --------------------------- |
| `OPENAI_API_KEY`                                       | OpenAI client               |
| `ANTHROPIC_API_KEY`                                    | Anthropic client            |
| `QDRANT_HOST`, `QDRANT_PORT`, `QDRANT_COLLECTION_NAME` | Qdrant client               |
| `QDRANT_CLOUD_API_KEY`                                 | Qdrant Cloud                |
| `EMBEDDING_HOST`, `EMBEDDING_PORT`                     | Embedding client            |
| `AZURE_DOCAI_KEY`, `AZURE_DOCAI_ENDPOINT`              | Azure Document Intelligence |
| `UNSTRUCTURED_ENDPOINT`, `UNSTRUCTURED_API_KEY`        | Unstructured client         |
| `S3_BUCKET_NAME`, `S3_DOWNLOAD_DIR`                    | S3 operations               |

## @tool decorator API

```python
from animuz_core import tool

@tool(description="Search documents")
async def rag(query: str, user_chat_id: str) -> str:
    ...
```

```python
agent = Agent(model="gpt-4o", tools=[rag])
response = await agent.chat(messages, system_prompt="You are helpful.")
```

## License

MIT
