Metadata-Version: 2.3
Name: rakam-systems-vectorstore
Version: 0.1.3
Summary: Utility package for interacting with vectorstores
Keywords: vector-store,embeddings,rag,semantic-search,pgvector,faiss
Author: Mohamed Hilel, Peng Zheng
Author-email: Mohamed Hilel <mohammedjassemhlel@gmail.com>, Peng Zheng <pengzheng990630@outlook.com>
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Intended Audience :: Developers
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Dist: rakam-systems-core>=0.1.2
Requires-Dist: rakam-systems-tools>=0.2.1
Requires-Dist: pyyaml>=6.0,<7.0
Requires-Dist: numpy>=1.24.0,<3.0.0
Requires-Dist: tqdm>=4.66.0,<5.0.0
Requires-Dist: rakam-systems-vectorstore[postgres] ; extra == 'all'
Requires-Dist: rakam-systems-vectorstore[faiss] ; extra == 'all'
Requires-Dist: rakam-systems-vectorstore[local-embeddings] ; extra == 'all'
Requires-Dist: rakam-systems-vectorstore[openai] ; extra == 'all'
Requires-Dist: rakam-systems-vectorstore[cohere] ; extra == 'all'
Requires-Dist: rakam-systems-vectorstore[loaders] ; extra == 'all'
Requires-Dist: cohere>=4.0.0 ; extra == 'cohere'
Requires-Dist: pytest-django>=4.5.0 ; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.3.0 ; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0 ; extra == 'dev'
Requires-Dist: black>=23.0.0 ; extra == 'dev'
Requires-Dist: ruff>=0.1.0 ; extra == 'dev'
Requires-Dist: pytest>=7.0.0 ; extra == 'dev'
Requires-Dist: faiss-cpu>=1.12.0 ; extra == 'faiss'
Requires-Dist: python-magic>=0.4.27 ; extra == 'loaders'
Requires-Dist: beautifulsoup4>=4.12.0 ; extra == 'loaders'
Requires-Dist: python-docx>=1.2.0 ; extra == 'loaders'
Requires-Dist: pymupdf>=1.24.0 ; extra == 'loaders'
Requires-Dist: pymupdf4llm>=0.0.17 ; extra == 'loaders'
Requires-Dist: docling==2.62.0 ; extra == 'loaders'
Requires-Dist: chonkie==1.4.2 ; extra == 'loaders'
Requires-Dist: odfpy==1.4.1 ; extra == 'loaders'
Requires-Dist: sentence-transformers>=5.1.0 ; extra == 'local-embeddings'
Requires-Dist: torch>=2.0.0 ; extra == 'local-embeddings'
Requires-Dist: openai>=1.0.0 ; extra == 'openai'
Requires-Dist: psycopg2-binary>=2.9.9,<3.0.0 ; extra == 'postgres'
Requires-Dist: pgvector>=0.3.0,<1.0.0 ; extra == 'postgres'
Requires-Dist: django>=4.0.0,<6.0.0 ; extra == 'postgres'
Requires-Python: >=3.10
Project-URL: Documentation, https://github.com/Rakam-AI/rakam_systems-inhouse
Project-URL: Homepage, https://github.com/Rakam-AI/rakam_systems-inhouse
Project-URL: Issues, https://github.com/Rakam-AI/rakam_systems-inhouse/issues
Project-URL: Repository, https://github.com/Rakam-AI/rakam_systems-inhouse
Provides-Extra: all
Provides-Extra: cohere
Provides-Extra: dev
Provides-Extra: faiss
Provides-Extra: loaders
Provides-Extra: local-embeddings
Provides-Extra: openai
Provides-Extra: postgres
Description-Content-Type: text/markdown

# Rakam System Vectorstore

The vectorstore package of Rakam Systems providing vector database solutions and document processing capabilities.

## Overview

`rakam-systems-vectorstore` provides comprehensive vector storage, embedding models, and document loading capabilities. This package depends on `rakam-systems-core`.

## Features

- **Configuration-First Design**: Change your entire vector store setup via YAML — no code changes
- **Multiple Backends**: PostgreSQL with pgvector and FAISS in-memory storage
- **Flexible Embeddings**: SentenceTransformers, OpenAI, and Cohere
- **Document Loaders**: PDF, DOCX, HTML, Markdown, CSV, and more
- **Search Capabilities**: Vector search, keyword search (BM25), and hybrid search
- **Chunking**: Intelligent text chunking with context preservation

## Installation

```bash
pip install rakam-systems-vectorstore

# With specific backends
pip install rakam-systems-vectorstore[postgres]
pip install rakam-systems-vectorstore[faiss]
pip install rakam-systems-vectorstore[all]
```

Available extras:

| Extra              | What it adds                                                                     |
| ------------------ | -------------------------------------------------------------------------------- |
| `postgres`         | `psycopg2-binary`, `pgvector`, `django`                                          |
| `faiss`            | `faiss-cpu`                                                                      |
| `local-embeddings` | `sentence-transformers`, `torch`                                                 |
| `openai`           | `openai` (for OpenAI embeddings)                                                 |
| `cohere`           | `cohere` (for Cohere embeddings)                                                 |
| `loaders`          | `python-magic`, `beautifulsoup4`, `python-docx`, `pymupdf`, `docling`, `chonkie` |
| `all`              | Everything above                                                                 |

## Quick Start

```python
from rakam_systems_vectorstore import FaissStore, Node, NodeMetadata

store = FaissStore(
    name="my_store",
    base_index_path="./indexes",
    embedding_model="Snowflake/snowflake-arctic-embed-m",
    initialising=True
)

nodes = [
    Node(
        content="Python is great for AI",
        metadata=NodeMetadata(source_file_uuid="doc1", position=0)
    )
]

store.create_collection_from_nodes("my_collection", nodes)
results, _ = store.search(collection_name="my_collection", query="AI programming", number=5)
```

## Core Components

- **ConfigurablePgVectorStore** — PostgreSQL with pgvector, hybrid search, keyword search
- **FaissStore** — In-memory FAISS-based vector search
- **ConfigurableEmbeddings** — SentenceTransformers, OpenAI, Cohere backends
- **AdaptiveLoader** — Auto-detects and loads PDF, DOCX, HTML, Markdown, CSV, email, code
- **TextChunker / AdvancedChunker** — Sentence-based and context-aware chunking

## Environment Variables

| Variable | Description |
|----------|-------------|
| `POSTGRES_HOST` | PostgreSQL host (default: localhost) |
| `POSTGRES_PORT` | PostgreSQL port (default: 5432) |
| `POSTGRES_DB` | Database name (default: vectorstore_db) |
| `POSTGRES_USER` | Database user (default: postgres) |
| `POSTGRES_PASSWORD` | Database password |
| `OPENAI_API_KEY` | For OpenAI embeddings |
| `COHERE_API_KEY` | For Cohere embeddings |
| `HUGGINGFACE_TOKEN` | For private HuggingFace models |

## Documentation

For PostgreSQL setup, search examples, YAML configuration, and full API reference, see the [official documentation](https://rakam-ai.github.io/rakam-systems-docs/).

## License

Apache 2.0
