Metadata-Version: 2.4
Name: fltr_core
Version: 0.1.0
Summary: Open-source RAG infrastructure toolkit with pluggable drivers for storage, vector databases, embeddings, and document processing
Author-email: FLTR Team <team@tryfltr.com>
License: MIT
Project-URL: Homepage, https://tryfltr.com
Project-URL: Documentation, https://docs.tryfltr.com
Project-URL: Repository, https://github.com/tryfltr/fltr
Project-URL: Issues, https://github.com/tryfltr/fltr/issues
Project-URL: Changelog, https://github.com/tryfltr/fltr/blob/main/CHANGELOG.md
Keywords: rag,retrieval-augmented-generation,document-processing,vector-database,embeddings,milvus,openai,cohere,s3,r2,document-parsing,semantic-search
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing :: General
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: typing-extensions>=4.0.0
Provides-Extra: storage-s3
Requires-Dist: boto3>=1.28.0; extra == "storage-s3"
Provides-Extra: storage-r2
Requires-Dist: boto3>=1.28.0; extra == "storage-r2"
Provides-Extra: storage-local
Provides-Extra: storage-all
Requires-Dist: fltr_core[storage-r2,storage-s3]; extra == "storage-all"
Provides-Extra: vectorstore-milvus
Requires-Dist: pymilvus>=2.3.0; extra == "vectorstore-milvus"
Provides-Extra: vectorstore-all
Requires-Dist: fltr_core[vectorstore-milvus]; extra == "vectorstore-all"
Provides-Extra: embeddings-openai
Requires-Dist: openai>=1.0.0; extra == "embeddings-openai"
Requires-Dist: tenacity>=8.0.0; extra == "embeddings-openai"
Provides-Extra: embeddings-cohere
Requires-Dist: cohere>=5.0.0; extra == "embeddings-cohere"
Requires-Dist: tenacity>=8.0.0; extra == "embeddings-cohere"
Provides-Extra: embeddings-voyageai
Requires-Dist: voyageai>=0.2.0; extra == "embeddings-voyageai"
Requires-Dist: tenacity>=8.0.0; extra == "embeddings-voyageai"
Provides-Extra: embeddings-all
Requires-Dist: fltr_core[embeddings-cohere,embeddings-openai,embeddings-voyageai]; extra == "embeddings-all"
Provides-Extra: connectors
Requires-Dist: aiolimiter>=1.1.0; extra == "connectors"
Requires-Dist: tenacity>=8.0.0; extra == "connectors"
Requires-Dist: jsonschema>=4.0.0; extra == "connectors"
Provides-Extra: connectors-reddit
Requires-Dist: fltr_core[connectors]; extra == "connectors-reddit"
Requires-Dist: asyncpraw>=7.7.0; extra == "connectors-reddit"
Provides-Extra: connectors-all
Requires-Dist: fltr_core[connectors-reddit]; extra == "connectors-all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: pytest-mock>=3.12.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: all
Requires-Dist: fltr_core[connectors-all,embeddings-all,storage-all,vectorstore-all]; extra == "all"
Dynamic: license-file

# FLTR - Open-Source RAG Infrastructure Toolkit

[![PyPI version](https://badge.fury.io/py/fltr.svg)](https://badge.fury.io/py/fltr)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/tryfltr/fltr/workflows/tests/badge.svg)](https://github.com/tryfltr/fltr/actions)

**FLTR** is a production-ready, open-source toolkit for building Retrieval-Augmented Generation (RAG) applications with pluggable drivers for storage, vector databases, embeddings, and document processing.

## 🎯 Why FLTR?

- **🔌 Pluggable Architecture**: Swap storage backends, vector databases, and embedding providers without changing your code
- **⚡ Production-Tested**: Extracted from [tryfltr.com](https://tryfltr.com) - battle-tested in production
- **🧪 Fully Tested**: 79+ tests with 53%+ coverage
- **📦 Zero Lock-In**: Abstract drivers mean you're never locked into a single provider
- **🚀 Async-First**: Built with modern async/await patterns for high performance
- **🎨 Type-Safe**: Full type hints and Pydantic validation

## 🚀 Quick Start

### Installation

```bash
# Install core library
pip install fltr

# Install with specific providers
pip install fltr[storage-s3,vectorstore-milvus,embeddings-openai]

# Install everything
pip install fltr[all]
```

### Basic Usage

```python
import asyncio
from fltr.drivers.storage import LocalFileDriver
from fltr.drivers.vectorstore import MilvusDriver
from fltr.drivers.embeddings import OpenAIEmbeddingProvider

async def main():
    # Initialize drivers
    storage = LocalFileDriver(base_path="./data")
    vectorstore = MilvusDriver(uri="./milvus.db")  # Milvus Lite for local dev
    embeddings = OpenAIEmbeddingProvider(api_key="sk-...")

    # Upload a document
    await storage.upload(
        key="documents/readme.txt",
        data=b"FLTR is awesome!",
        content_type="text/plain"
    )

    # Generate embeddings
    text = "What is FLTR?"
    embedding = await embeddings.embed_text(text)

    # Create collection and insert
    await vectorstore.create_collection("docs", dimension=1536)
    await vectorstore.insert(
        collection_name="docs",
        vectors=[embedding],
        texts=[text],
        metadata=[{"source": "readme.txt"}]
    )

    # Search
    results = await vectorstore.search(
        collection_name="docs",
        query_vector=embedding,
        limit=5
    )

    print(results)

asyncio.run(main())
```

## 📚 Available Drivers

### Storage Drivers

| Driver | Install | Use Case |
|--------|---------|----------|
| **LocalFileDriver** | `pip install fltr` | Local development, testing |
| **S3StorageDriver** | `pip install fltr[storage-s3]` | AWS S3, MinIO, DigitalOcean Spaces |
| **R2StorageDriver** | `pip install fltr[storage-r2]` | Cloudflare R2 (S3-compatible) |

### Vector Store Drivers

| Driver | Install | Use Case |
|--------|---------|----------|
| **MilvusDriver** | `pip install fltr[vectorstore-milvus]` | Milvus Lite (local) or Milvus Cloud/Zilliz |

### Embedding Providers

| Provider | Install | Models |
|----------|---------|--------|
| **OpenAI** | `pip install fltr[embeddings-openai]` | text-embedding-3-small, text-embedding-3-large |
| **Cohere** | `pip install fltr[embeddings-cohere]` | embed-english-v3.0, embed-multilingual-v3.0 |
| **Voyage AI** | `pip install fltr[embeddings-voyageai]` | voyage-3, voyage-code-2, voyage-law-2 |

## 🔧 Configuration

All drivers support both direct initialization and environment-based configuration:

### Environment Variables

```bash
# Storage (S3/R2)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
export S3_BUCKET="my-bucket"
export S3_REGION="us-east-1"

# Vector Store (Milvus)
export MILVUS_URI="https://your-milvus-instance.com"
export MILVUS_TOKEN="your-token"
export VECTOR_METRIC_TYPE="COSINE"

# Embeddings (OpenAI)
export OPENAI_API_KEY="sk-..."
export OPENAI_EMBEDDING_MODEL="text-embedding-3-small"
export OPENAI_BATCH_SIZE="100"
```

### Factory Methods

```python
# Create from environment variables
storage = S3StorageDriver.from_env()
vectorstore = MilvusDriver.from_env()
embeddings = OpenAIEmbeddingProvider.from_env()
```

## 🎨 Advanced Features

### Retry Logic

All embedding providers include automatic retry logic with exponential backoff:

```python
embeddings = OpenAIEmbeddingProvider(
    api_key="sk-...",
    max_retries=5,
    retry_min_wait=1,
    retry_max_wait=60
)
```

### Batch Processing

Efficient batching for embedding large datasets:

```python
texts = ["document 1", "document 2", ...]  # 1000s of texts
embeddings_list = await embeddings_provider.embed_batch(
    texts=texts,
    batch_size=100  # Process 100 at a time
)
```

### Metadata Filtering

Powerful metadata filtering in vector search:

```python
results = await vectorstore.search(
    collection_name="docs",
    query_vector=embedding,
    filter_expr='dataset_id == "my-dataset" && chunk_type == "text"',
    limit=10
)
```

### Input Types (Cohere & Voyage AI)

Optimize embeddings for different use cases:

```python
# Cohere
cohere_embeddings = CohereEmbeddingProvider(api_key="...")

# For indexing documents
doc_embeddings = await cohere_embeddings.embed_batch(
    texts=documents,
    input_type="search_document"
)

# For search queries
query_embedding = await cohere_embeddings.embed_text(
    text="What is RAG?",
    input_type="search_query"
)

# Voyage AI - Domain-specific models
voyage_embeddings = VoyageAIEmbeddingProvider(
    api_key="...",
    model="voyage-law-2"  # Optimized for legal text
)
```

## 🏗️ Architecture

FLTR follows a clean architecture with abstract base classes:

```
fltr/
├── drivers/
│   ├── storage/
│   │   ├── base.py          # StorageDriver abstract class
│   │   ├── local.py         # Local filesystem
│   │   ├── s3.py            # AWS S3
│   │   └── r2.py            # Cloudflare R2
│   ├── vectorstore/
│   │   ├── base.py          # VectorStoreDriver abstract class
│   │   └── milvus.py        # Milvus implementation
│   └── embeddings/
│       ├── base.py          # EmbeddingProvider abstract class
│       ├── openai.py        # OpenAI
│       ├── cohere.py        # Cohere
│       └── voyageai.py      # Voyage AI
├── config/
│   └── schema.py            # Pydantic configuration schemas
└── parsers/
    ├── base.py              # DocumentParser abstract class
    └── registry.py          # Parser discovery system
```

## 🧪 Testing

FLTR includes comprehensive tests:

```bash
# Run all tests
pytest

# Run with coverage
pytest --cov=fltr --cov-report=html

# Run specific test file
pytest tests/test_embeddings.py -v
```

## 🤝 Contributing

We welcome contributions! FLTR is extracted from production code at [tryfltr.com](https://tryfltr.com).

### Adding a New Driver

1. Inherit from the appropriate base class
2. Implement all abstract methods
3. Add comprehensive tests
4. Add to `pyproject.toml` entry points
5. Submit a PR

Example:

```python
from fltr.drivers.embeddings.base import EmbeddingProvider

class MyEmbeddingProvider(EmbeddingProvider):
    async def embed_text(self, text: str) -> list[float]:
        # Your implementation
        pass

    async def embed_batch(self, texts: list[str]) -> list[list[float]]:
        # Your implementation
        pass

    def get_dimension(self) -> int:
        return 1536
```

## 📖 Documentation

- [Full Documentation](https://docs.tryfltr.com)
- [API Reference](https://docs.tryfltr.com/api)
- [Examples](https://github.com/tryfltr/fltr/tree/main/examples)
- [Changelog](https://github.com/tryfltr/fltr/blob/main/CHANGELOG.md)

## 🔒 License

MIT License - see [LICENSE](LICENSE) file for details.

## 🙏 Acknowledgments

FLTR is built and maintained by the team at [tryfltr.com](https://tryfltr.com). We're on a mission to make RAG infrastructure accessible to everyone.

Special thanks to:
- [Milvus](https://milvus.io/) for the excellent vector database
- [OpenAI](https://openai.com/), [Cohere](https://cohere.com/), and [Voyage AI](https://www.voyageai.com/) for embedding APIs
- The Python community for amazing tools and libraries

## 🔗 Links

- **Website**: [tryfltr.com](https://tryfltr.com)
- **Documentation**: [docs.tryfltr.com](https://docs.tryfltr.com)
- **GitHub**: [github.com/tryfltr/fltr](https://github.com/tryfltr/fltr)
- **PyPI**: [pypi.org/project/fltr](https://pypi.org/project/fltr)
- **Discord**: [Join our community](https://discord.gg/fltr)

---

**Built with ❤️ by the FLTR team**
