Metadata-Version: 2.4
Name: lexora
Version: 0.1.0
Summary: A production-ready, plug-and-play Python SDK for building intelligent RAG systems
Home-page: https://github.com/VesperAkshay/lexora
Author: VesperAkshay
Author-email: VesperAkshay <vesperakshay@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/VesperAkshay/lexora
Project-URL: Documentation, https://vesperakshay.github.io/lexora
Project-URL: Repository, https://github.com/VesperAkshay/lexora
Project-URL: Bug Tracker, https://github.com/VesperAkshay/lexora/issues
Keywords: rag,llm,vector-database,ai,machine-learning,nlp
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Requires-Dist: pydantic>=2.0.0
Requires-Dist: litellm>=1.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: faiss-cpu>=1.7.0
Requires-Dist: pinecone>=5.0.0
Requires-Dist: chromadb>=0.4.0
Requires-Dist: openai>=1.0.0
Requires-Dist: tiktoken>=0.5.0
Requires-Dist: asyncio-throttle>=1.0.0
Requires-Dist: tenacity>=8.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-mock>=3.10.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: isort>=5.12.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=6.0.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme>=1.2.0; extra == "docs"
Requires-Dist: myst-parser>=1.0.0; extra == "docs"
Provides-Extra: gpu
Requires-Dist: faiss-gpu>=1.7.0; extra == "gpu"
Dynamic: author
Dynamic: home-page
Dynamic: requires-python

# Lexora Agentic RAG SDK

<div align="center">

**Production-ready Agentic RAG SDK with minimal configuration**

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://img.shields.io/badge/tests-passing-brightgreen.svg)](./tests)

[Quick Start](#quick-start) • [Documentation](#documentation) • [Examples](#examples) • [API Reference](#api-reference)

</div>

---

## 🚀 What is Lexora?

Lexora is a production-ready Agentic RAG (Retrieval-Augmented Generation) SDK that makes it easy to build intelligent applications with semantic search and AI-powered reasoning. With just a few lines of code, you can:

- 📚 Create and manage document corpora
- 🔍 Perform semantic search across your documents
- 🤖 Build AI agents that reason over your data
- 🛠️ Extend functionality with custom tools
- 🎯 Deploy to production with confidence

## ✨ Key Features

- **Zero-Config Setup**: Get started in minutes with sensible defaults
- **Multiple Vector Databases**: Support for FAISS, Pinecone, and Chroma
- **Flexible Embeddings**: OpenAI, HuggingFace, Gemini, or custom providers
- **Flexible LLM Integration**: Works with any LLM via LiteLLM
- **Built-in RAG Tools**: 10+ pre-built tools for document management
- **Custom Tool Support**: Easily add your own tools
- **Production-Ready**: Comprehensive error handling, logging, and testing
- **Type-Safe**: Full type hints and Pydantic validation
- **Cost-Effective**: Free embedding options available

---

## 📦 Installation

### Prerequisites

- Python 3.8 or higher
- pip or conda package manager

### Install from Source

```bash
# Clone the repository
git clone https://github.com/yourusername/lexora.git
cd lexora

# Create a virtual environment (recommended)
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Install in development mode
pip install -e .
```

### Install from PyPI (Coming Soon)

```bash
pip install lexora
```

---

## 🎯 Quick Start

### Basic Usage

```python
from lexora import RAGAgent

# Initialize the agent with defaults
agent = RAGAgent()

# Create a document corpus
await agent.tool_registry.get_tool("create_corpus").run(
    corpus_name="my_docs",
    description="My document collection"
)

# Add documents
documents = [
    {"content": "Python is a programming language.", "metadata": {"topic": "python"}},
    {"content": "Machine learning is a subset of AI.", "metadata": {"topic": "ml"}}
]

await agent.tool_registry.get_tool("add_data").run(
    corpus_name="my_docs",
    documents=documents
)

# Query your documents
result = await agent.tool_registry.get_tool("rag_query").run(
    corpus_name="my_docs",
    query="What is Python?",
    top_k=5
)

print(result.data["results"])
```

### Using the Agent for Reasoning

```python
# Ask questions and get AI-powered answers
response = await agent.query("Explain machine learning in simple terms")

print(f"Answer: {response.answer}")
print(f"Confidence: {response.confidence}")
print(f"Sources: {len(response.sources)}")
```

---

## 📖 Documentation

### Table of Contents

1. [Installation Guide](#installation-guide)
2. [Configuration](#configuration)
3. [Core Concepts](#core-concepts)
4. [RAG Tools](#rag-tools)
5. [Custom Tools](#custom-tools)
6. [Vector Databases](#vector-databases)
7. [LLM Integration](#llm-integration)
8. [Error Handling](#error-handling)
9. [Best Practices](#best-practices)
10. [API Reference](#api-reference)

---

## ⚙️ Configuration

Lexora supports multiple configuration methods:

### 1. Default Configuration (Easiest)

```python
from lexora import RAGAgent

# Uses mock LLM and FAISS vector database
agent = RAGAgent()
```

### 2. YAML Configuration

```yaml
# config.yaml
llm:
  provider: "openai"
  model: "gpt-4"
  api_key: "${OPENAI_API_KEY}"
  temperature: 0.7

vector_db:
  provider: "faiss"
  embedding_model: "text-embedding-ada-002"
  dimension: 1536
  connection_params:
    storage_path: "./vector_storage"

agent:
  max_iterations: 5
  enable_reasoning: true
  log_level: "INFO"
```

```python
from lexora import RAGAgent

agent = RAGAgent.from_yaml("config.yaml")
```

### 3. Environment Variables

```bash
# .env file
LEXORA_LLM_PROVIDER=openai
LEXORA_LLM_MODEL=gpt-4
LEXORA_LLM_API_KEY=your-api-key
LEXORA_VECTORDB_PROVIDER=faiss
LEXORA_VECTORDB_EMBEDDING_MODEL=text-embedding-ada-002
```

```python
from lexora import RAGAgent

agent = RAGAgent.from_env()
```

### 4. Programmatic Configuration

```python
from lexora import RAGAgent, LLMConfig, VectorDBConfig, AgentConfig

agent = RAGAgent(
    llm_config=LLMConfig(
        provider="openai",
        model="gpt-4",
        api_key="your-api-key"
    ),
    vector_db_config=VectorDBConfig(
        provider="faiss",
        embedding_model="text-embedding-ada-002",
        dimension=1536
    ),
    agent_config=AgentConfig(
        max_iterations=5,
        enable_reasoning=True
    )
)
```

---

## 🎨 Embedding Options

**Important:** You are NOT limited to OpenAI embeddings! Lexora supports multiple embedding providers.

### Available Options

| Provider | Cost | Quality | Privacy | Best For |
|----------|------|---------|---------|----------|
| **HuggingFace** | Free | High | ✅ Local | Production (recommended) |
| **OpenAI** | Paid | Highest | ❌ Cloud | Enterprise |
| **Gemini** | Free tier | High | ❌ Cloud | Gemini users |
| **Mock** | Free | Low | ✅ Local | Testing |

### Quick Examples

#### 1. Free Local Embeddings (Recommended)

```python
# Install sentence-transformers
# pip install sentence-transformers

from lexora import RAGAgent
from lexora.models.config import VectorDBConfig

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="faiss",
        dimension=384,  # all-MiniLM-L6-v2 dimension
        connection_params={
            "index_type": "Flat",
            "persist_directory": "./vector_db"
        }
    )
)
```

#### 2. OpenAI Embeddings

```python
from lexora import RAGAgent
from lexora.models.config import VectorDBConfig

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="faiss",
        dimension=1536,  # OpenAI dimension
        connection_params={
            "embedding_model": "text-embedding-ada-002",
            "openai_api_key": "your-key"
        }
    )
)
```

#### 3. Custom Embedding Provider

```python
from lexora.utils.embeddings import BaseEmbeddingProvider
from sentence_transformers import SentenceTransformer

class HuggingFaceProvider(BaseEmbeddingProvider):
    def __init__(self, model_name="all-MiniLM-L6-v2"):
        self.model = SentenceTransformer(model_name)
    
    async def generate_embedding(self, text: str):
        return self.model.encode(text).tolist()
    
    def get_dimension(self) -> int:
        return self.model.get_sentence_embedding_dimension()
```

📚 **[Full Embedding Guide](./docs/EMBEDDINGS.md)** - Detailed documentation on all embedding options

---

## 🧩 Core Concepts

### Document Corpus

A corpus is a collection of documents that can be searched semantically.

```python
# Create a corpus
await agent.tool_registry.get_tool("create_corpus").run(
    corpus_name="knowledge_base",
    description="Company knowledge base",
    metadata={"department": "engineering"}
)
```

### Documents

Documents are the basic unit of information in Lexora.

```python
document = {
    "content": "Your document text here",
    "metadata": {
        "source": "documentation",
        "author": "John Doe",
        "date": "2024-01-01"
    }
}
```

### Semantic Search

Search documents by meaning, not just keywords.

```python
results = await agent.tool_registry.get_tool("rag_query").run(
    corpus_name="knowledge_base",
    query="How do I deploy to production?",
    top_k=5,
    min_score=0.7
)
```

---

## 🛠️ RAG Tools

Lexora comes with 10+ built-in tools:

### Core Tools

| Tool | Description |
|------|-------------|
| `create_corpus` | Create a new document corpus |
| `add_data` | Add documents to a corpus |
| `rag_query` | Search documents semantically |
| `list_corpora` | List all available corpora |
| `get_corpus_info` | Get detailed corpus information |
| `delete_corpus` | Delete a corpus |
| `delete_document` | Delete a specific document |
| `update_document` | Update an existing document |
| `bulk_add_data` | Add large batches of documents |
| `health_check` | Check system health |

### Tool Usage Examples

See [examples/](./examples/) directory for detailed examples of each tool.

---

## 🔧 Custom Tools

Extend Lexora with your own tools:

```python
from lexora import BaseTool, ToolParameter

class WeatherTool(BaseTool):
    @property
    def name(self) -> str:
        return "get_weather"
    
    @property
    def description(self) -> str:
        return "Get current weather for a location"
    
    @property
    def version(self) -> str:
        return "1.0.0"
    
    def _setup_parameters(self) -> None:
        self._parameters = [
            ToolParameter(
                name="location",
                type="string",
                description="City name",
                required=True
            )
        ]
    
    async def _execute(self, location: str, **kwargs):
        # Your implementation here
        return {"temperature": 72, "condition": "sunny"}

# Register the tool
agent.add_tool(WeatherTool())
```

---

## 💾 Vector Databases

### FAISS (Default)

```python
from lexora import RAGAgent, VectorDBConfig

agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="faiss",
        embedding_model="text-embedding-ada-002",
        dimension=1536,
        connection_params={"storage_path": "./faiss_storage"}
    )
)
```

### Pinecone

```python
agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="pinecone",
        embedding_model="text-embedding-ada-002",
        dimension=1536,
        connection_params={
            "api_key": "your-pinecone-key",
            "environment": "us-west1-gcp"
        }
    )
)
```

### Chroma

```python
agent = RAGAgent(
    vector_db_config=VectorDBConfig(
        provider="chroma",
        embedding_model="text-embedding-ada-002",
        dimension=1536,
        connection_params={"persist_directory": "./chroma_storage"}
    )
)
```

---

## 🤖 LLM Integration

Lexora uses LiteLLM for universal LLM support:

### OpenAI

```python
from lexora import LLMConfig

llm_config = LLMConfig(
    provider="openai",
    model="gpt-4",
    api_key="your-api-key",
    temperature=0.7
)
```

### Anthropic Claude

```python
llm_config = LLMConfig(
    provider="anthropic",
    model="claude-3-opus-20240229",
    api_key="your-api-key"
)
```

### Azure OpenAI

```python
llm_config = LLMConfig(
    provider="azure",
    model="gpt-4",
    api_key="your-api-key",
    api_base="https://your-resource.openai.azure.com/"
)
```

---

## 🚨 Error Handling

Lexora provides structured error handling:

```python
result = await agent.tool_registry.get_tool("rag_query").run(
    corpus_name="nonexistent",
    query="test"
)

if result.status == "error":
    print(f"Error: {result.error}")
    # Error includes context and suggestions
```

All errors include:
- Error code
- Descriptive message
- Context information
- Helpful suggestions

---

## 📚 Examples

Check out the [examples/](./examples/) directory for complete examples:

- `01_quick_start.py` - Basic usage
- `02_custom_configuration.py` - Configuration options
- `03_corpus_management.py` - Managing corpora
- `04_custom_tools.py` - Creating custom tools
- `rag_tools_demo.py` - All RAG tools
- `rag_agent_with_real_embeddings.py` - Production setup

---

## 🧪 Testing

Run the test suite:

```bash
# Run all tests
python run_tests.py

# Run specific test file
python tests/test_error_handling.py

# Run with pytest
pytest tests/ -v
```

---

## 📊 Performance

- **Query Speed**: < 1ms for small corpora
- **Batch Processing**: 12,000+ documents/second
- **Concurrent Queries**: 10 queries in 5ms
- **Memory Efficient**: Handles 200+ documents in batches

---

## 🤝 Contributing

We welcome contributions! Please see [CONTRIBUTING.md](./CONTRIBUTING.md) for guidelines.

---

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](./LICENSE) file for details.

---

## 🆘 Support

- **Documentation**: [Full docs](./docs/)
- **Examples**: [Example code](./examples/)
- **Issues**: [GitHub Issues](https://github.com/yourusername/lexora/issues)
- **Discussions**: [GitHub Discussions](https://github.com/yourusername/lexora/discussions)

---

## 🗺️ Roadmap

- [ ] PyPI package distribution
- [ ] Additional vector database support
- [ ] Streaming responses
- [ ] Multi-modal support (images, audio)
- [ ] Advanced caching strategies
- [ ] Distributed deployment support

---

## 🙏 Acknowledgments

Built with:
- [LiteLLM](https://github.com/BerriAI/litellm) - Universal LLM interface
- [FAISS](https://github.com/facebookresearch/faiss) - Vector similarity search
- [Pydantic](https://github.com/pydantic/pydantic) - Data validation
- [Chroma](https://www.trychroma.com/) - Vector database

---

<div align="center">

**Made with ❤️ by the Lexora Team**

[⭐ Star us on GitHub](https://github.com/yourusername/lexora) • [📖 Read the Docs](./docs/) • [🐦 Follow us on Twitter](https://twitter.com/lexora)

</div>
