Metadata-Version: 2.4
Name: ragpackai
Version: 0.1.4
Summary: Portable Retrieval-Augmented Generation Library
Author-email: ragpackai Team <aistudentlearn4@gmail.com>
Maintainer-email: ragpackai Team <aistudentlearn4@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/AIMLDev726/ragpackai
Project-URL: Documentation, https://AIMLDev726.readthedocs.io/
Project-URL: Repository, https://github.com/AIMLDev726/ragpackai
Project-URL: Bug Reports, https://github.com/AIMLDev726/ragpackai/issues
Project-URL: Changelog, https://github.com/AIMLDev726/ragpackai/blob/main/CHANGELOG.md
Keywords: rag,retrieval,augmented,generation,llm,embeddings,vectorstore,ai,nlp,machine-learning,langchain
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic<3.0.0,>=2.0.0
Requires-Dist: tqdm<5.0.0,>=4.60.0
Requires-Dist: cryptography<50.0.0,>=3.4.0
Provides-Extra: core
Requires-Dist: langchain<0.3.0,>=0.2.0; extra == "core"
Requires-Dist: langchain-community<0.3.0,>=0.2.0; extra == "core"
Requires-Dist: chromadb<0.6.0,>=0.5.0; extra == "core"
Requires-Dist: langchain-chroma<0.3.0,>=0.1.0; extra == "core"
Provides-Extra: documents
Requires-Dist: PyPDF2<4.0.0,>=3.0.0; extra == "documents"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers<3.0.0,>=2.0.0; extra == "embeddings"
Provides-Extra: faiss
Requires-Dist: faiss-cpu<2.0.0,>=1.7.0; extra == "faiss"
Provides-Extra: openai
Requires-Dist: langchain-openai<0.2.0,>=0.1.0; extra == "openai"
Requires-Dist: openai<2.0.0,>=1.30.0; extra == "openai"
Provides-Extra: google
Requires-Dist: langchain-google-genai<2.0.0,>=1.0.0; extra == "google"
Requires-Dist: langchain-google-vertexai<2.0.0,>=1.0.0; extra == "google"
Provides-Extra: groq
Requires-Dist: groq<1.0.0,>=0.4.0; extra == "groq"
Requires-Dist: langchain-groq<1.0.0,>=0.1.0; extra == "groq"
Provides-Extra: cerebras
Requires-Dist: cerebras-cloud-sdk<2.0.0,>=1.0.0; extra == "cerebras"
Requires-Dist: langchain-cerebras<1.0.0,>=0.1.0; extra == "cerebras"
Provides-Extra: nvidia
Requires-Dist: langchain-nvidia-ai-endpoints<1.0.0,>=0.1.0; extra == "nvidia"
Provides-Extra: standard
Requires-Dist: langchain<0.3.0,>=0.2.0; extra == "standard"
Requires-Dist: langchain-community<0.3.0,>=0.2.0; extra == "standard"
Requires-Dist: langchain-openai<0.2.0,>=0.1.0; extra == "standard"
Requires-Dist: openai<2.0.0,>=1.30.0; extra == "standard"
Requires-Dist: chromadb<0.6.0,>=0.5.0; extra == "standard"
Requires-Dist: langchain-chroma<0.3.0,>=0.1.0; extra == "standard"
Requires-Dist: sentence-transformers<3.0.0,>=2.0.0; extra == "standard"
Requires-Dist: PyPDF2<4.0.0,>=3.0.0; extra == "standard"
Provides-Extra: all
Requires-Dist: langchain<0.3.0,>=0.2.0; extra == "all"
Requires-Dist: langchain-community<0.3.0,>=0.2.0; extra == "all"
Requires-Dist: langchain-openai<0.2.0,>=0.1.0; extra == "all"
Requires-Dist: openai<2.0.0,>=1.30.0; extra == "all"
Requires-Dist: chromadb<0.6.0,>=0.5.0; extra == "all"
Requires-Dist: langchain-chroma<0.3.0,>=0.1.0; extra == "all"
Requires-Dist: sentence-transformers<3.0.0,>=2.0.0; extra == "all"
Requires-Dist: PyPDF2<4.0.0,>=3.0.0; extra == "all"
Requires-Dist: faiss-cpu<2.0.0,>=1.7.0; extra == "all"
Requires-Dist: langchain-google-genai<2.0.0,>=1.0.0; extra == "all"
Requires-Dist: langchain-google-vertexai<2.0.0,>=1.0.0; extra == "all"
Requires-Dist: groq<1.0.0,>=0.4.0; extra == "all"
Requires-Dist: langchain-groq<1.0.0,>=0.1.0; extra == "all"
Requires-Dist: cerebras-cloud-sdk<2.0.0,>=1.0.0; extra == "all"
Requires-Dist: langchain-cerebras<1.0.0,>=0.1.0; extra == "all"
Requires-Dist: langchain-nvidia-ai-endpoints<1.0.0,>=0.1.0; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest<8.0.0,>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov<5.0.0,>=4.0.0; extra == "dev"
Requires-Dist: black<25.0.0,>=22.0.0; extra == "dev"
Requires-Dist: flake8<7.0.0,>=5.0.0; extra == "dev"
Requires-Dist: mypy<2.0.0,>=1.0.0; extra == "dev"
Requires-Dist: pre-commit<4.0.0,>=2.20.0; extra == "dev"
Requires-Dist: build<1.0.0,>=0.8.0; extra == "dev"
Requires-Dist: twine<5.0.0,>=4.0.0; extra == "dev"
Dynamic: license-file

# ragpackai 📦

**Portable Retrieval-Augmented Generation Library**

ragpackai is a Python library for creating, saving, loading, and querying portable RAG (Retrieval-Augmented Generation) packs. It allows you to bundle documents, embeddings, vectorstores, and configuration into a single `.rag` file that can be easily shared and deployed across different environments.

## ✨ Features

- 🚀 **Portable RAG Packs**: Bundle everything into a single `.rag` file
- 🔄 **Provider Flexibility**: Support for OpenAI, Google, Groq, Cerebras, and HuggingFace
- 🔒 **Encryption Support**: Optional AES-GCM encryption for sensitive data
- 🎯 **Runtime Overrides**: Change embedding/LLM providers without rebuilding
- 📚 **Multiple Formats**: Support for PDF, TXT, MD, and more
- 🛠️ **CLI Tools**: Command-line interface for easy pack management
- 🔧 **Lazy Loading**: Efficient dependency management with lazy imports

## 🚀 Quick Start

### Installation

```bash
# Basic installation (minimal dependencies)
pip install ragpackai

# Recommended for most users
pip install ragpackai[standard]

# Specific features
pip install ragpackai[core]        # Core RAG functionality
pip install ragpackai[openai]      # OpenAI integration
pip install ragpackai[documents]   # PDF processing
pip install ragpackai[embeddings]  # Sentence transformers

# Provider-specific
pip install ragpackai[google]      # Google/Gemini
pip install ragpackai[groq]        # Groq
pip install ragpackai[cerebras]    # Cerebras
pip install ragpackai[nvidia]      # NVIDIA

# Everything (may have installation issues on some systems)
pip install ragpackai[all]
```

### 🚨 Installation Issues?

If you encounter installation problems (especially with `faiss-cpu`):

```bash
# Try the standard installation first
pip install ragpackai[standard]

# For FAISS issues, use conda instead
conda install -c conda-forge faiss-cpu
pip install ragpackai[core,openai,documents,embeddings]

# Get installation help
python -c "import ragpackai; ragpackai.install_guide()"
```

### Basic Usage

```python
from ragpackai import ragpackai

# Create a pack from documents
pack = ragpackai.from_files([
    "docs/manual.pdf", 
    "notes.txt",
    "knowledge_base/"
])

# Save the pack
pack.save("my_knowledge.rag")

# Load and query
pack = ragpackai.load("my_knowledge.rag")

# Simple retrieval (no LLM)
results = pack.query("How do I install this?", top_k=3)
print(results)

# Question answering with LLM
answer = pack.ask("What are the main features?")
print(answer)
```

### Provider Overrides

```python
# Load with different providers
pack = ragpackai.load(
    "my_knowledge.rag",
    embedding_config={
        "provider": "google", 
        "model_name": "textembedding-gecko"
    },
    llm_config={
        "provider": "groq", 
        "model_name": "mixtral-8x7b-32768"
    }
)

answer = pack.ask("Explain the architecture")
```

## 🛠️ Command Line Interface

### Create a RAG Pack

```bash
# From files and directories
ragpackai create docs/ notes.txt --output knowledge.rag

# With custom settings
ragpackai create docs/ \
  --embedding-provider openai \
  --embedding-model text-embedding-3-large \
  --chunk-size 1024 \
  --encrypt-key mypassword
```

### Query and Ask

```bash
# Simple retrieval
ragpackai query knowledge.rag "How to install?"

# Question answering
ragpackai ask knowledge.rag "What are the requirements?" \
  --llm-provider openai \
  --llm-model gpt-4o

# With provider overrides
ragpackai ask knowledge.rag "Explain the API" \
  --embedding-provider google \
  --embedding-model textembedding-gecko \
  --llm-provider groq \
  --llm-model mixtral-8x7b-32768
```

### Pack Information

```bash
ragpackai info knowledge.rag
```

## 🏗️ Architecture

### .rag File Structure

A `.rag` file is a structured zip archive:

```
mypack.rag
├── metadata.json          # Pack metadata
├── config.json           # Default configurations
├── documents/            # Original documents
│   ├── doc1.txt
│   └── doc2.pdf
└── vectorstore/          # Chroma vectorstore
    ├── chroma.sqlite3
    └── ...
```

### Supported Providers

**Embedding Providers:**
- `openai`: text-embedding-3-small, text-embedding-3-large
- `huggingface`: all-MiniLM-L6-v2, all-mpnet-base-v2 (offline)
- `google`: textembedding-gecko

**LLM Providers:**
- `openai`: gpt-4o, gpt-4o-mini, gpt-3.5-turbo
- `google`: gemini-pro, gemini-1.5-flash
- `groq`: mixtral-8x7b-32768, llama2-70b-4096
- `cerebras`: llama3.1-8b, llama3.1-70b

## 📖 API Reference

### ragpackai Class

#### `ragpackai.from_files(files, embed_model="openai:text-embedding-3-small", **kwargs)`

Create a RAG pack from files.

**Parameters:**
- `files`: List of file paths or directories
- `embed_model`: Embedding model in format "provider:model"
- `chunk_size`: Text chunk size (default: 512)
- `chunk_overlap`: Chunk overlap (default: 50)
- `name`: Pack name

#### `ragpackai.load(path, embedding_config=None, llm_config=None, **kwargs)`

Load a RAG pack from file.

**Parameters:**
- `path`: Path to .rag file
- `embedding_config`: Override embedding configuration
- `llm_config`: Override LLM configuration
- `reindex_on_mismatch`: Rebuild vectorstore if dimensions mismatch
- `decrypt_key`: Decryption password

#### `pack.save(path, encrypt_key=None)`

Save pack to .rag file.

#### `pack.query(question, top_k=3)`

Retrieve relevant chunks (no LLM).

#### `pack.ask(question, top_k=4, temperature=0.0)`

Ask question with LLM.

### Provider Wrappers

```python
# Direct provider access
from ragpackai.embeddings import OpenAI, HuggingFace, Google
from ragpackai.llms import OpenAIChat, GoogleChat, GroqChat

# Create embedding provider
embeddings = OpenAI(model_name="text-embedding-3-large")
vectors = embeddings.embed_documents(["Hello world"])

# Create LLM provider
llm = OpenAIChat(model_name="gpt-4o", temperature=0.7)
response = llm.invoke("What is AI?")
```

## 🔧 Configuration

### Environment Variables

```bash
# API Keys
export OPENAI_API_KEY="your-key"
export GOOGLE_CLOUD_PROJECT="your-project"
export GROQ_API_KEY="your-key"
export CEREBRAS_API_KEY="your-key"

# Optional
export GOOGLE_APPLICATION_CREDENTIALS="path/to/service-account.json"
```

### Configuration Files

```python
# Custom embedding config
embedding_config = {
    "provider": "huggingface",
    "model_name": "all-mpnet-base-v2",
    "device": "cuda"  # Use GPU
}

# Custom LLM config
llm_config = {
    "provider": "openai",
    "model_name": "gpt-4o",
    "temperature": 0.7,
    "max_tokens": 2000
}
```

## 🔒 Security

### Encryption

ragpackai supports AES-GCM encryption for sensitive data:

```python
# Save with encryption
pack.save("sensitive.rag", encrypt_key="strong-password")

# Load encrypted pack
pack = ragpackai.load("sensitive.rag", decrypt_key="strong-password")
```

### Best Practices

- Use strong passwords for encryption
- Store API keys securely in environment variables
- Validate .rag files before loading in production
- Consider network security when sharing packs

## 🧪 Examples

See the `examples/` directory for complete examples:

- `basic_usage.py` - Simple pack creation and querying
- `provider_overrides.py` - Using different providers
- `encryption_example.py` - Working with encrypted packs
- `cli_examples.sh` - Command-line usage examples

## 🤝 Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🆘 Support

- 📖 [Documentation](https://aimldev726.github.io/ragpackai/)
- 🐛 [Issue Tracker](https://github.com/AIMLDev726/ragpackai/issues)
- 💬 [Discussions](https://github.com/AIMLDev726/ragpackai/discussions)

## 🙏 Acknowledgments

Built with:
- [LangChain](https://langchain.com/) - LLM framework
- [ChromaDB](https://www.trychroma.com/) - Vector database
- [Sentence Transformers](https://www.sbert.net/) - Embedding models
