Metadata-Version: 2.4
Name: dot-search
Version: 0.3.0
Summary: Augment existing database tables with vector and BM25 search
Project-URL: Homepage, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-search
Project-URL: Repository, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-search
Project-URL: Issues, https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-search/-/issues
Author-email: deepika Team <contact@deepika.ai>
License: TODO: TO BE COMPLETED
License-File: LICENSE
Keywords: bm25,deepika,embeddings,hybrid,open-toolbox,search,vector
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.12
Requires-Dist: aiosqlite>=0.20
Requires-Dist: asyncpg>=0.29
Requires-Dist: httpx>=0.27
Requires-Dist: pgvector>=0.3
Requires-Dist: sqlalchemy[asyncio]>=2.0
Description-Content-Type: text/markdown

# dot-search

![Python Version](https://img.shields.io/badge/python-3.12%2B-blue)

**dot-search** augments existing database tables with vector search, BM25 keyword search, and exact substring matching. Results from multiple strategies are fused via Reciprocal Rank Fusion (RRF).

## Prerequisites

### PostgreSQL (production)

Requires [pgvector](https://github.com/pgvector/pgvector) and [ParadeDB pg_search](https://github.com/paradedb/paradedb):

```sql
CREATE EXTENSION vector;
CREATE EXTENSION pg_search;
```

### SQLite (testing only)

Uses [sqlite-vec](https://github.com/asg017/sqlite-vec). No BM25 support.

## Install

```bash
pip install dot-search
```

## Environment variables

| Variable | Example |
|---|---|
| `DOT__EMBED_API_KEY` | Bearer token |
| `DOT__EMBED_BASE_URL` | `https://api.openai.com/v1` |
| `DOT__EMBED_MODEL` | `text-embedding-3-small` |
| `DOT__EMBED_DIMENSION` | `1536` |

Works with any OpenAI-compatible API (OpenAI, vLLM, TGI, Ollama, etc.).

## Usage

```python
import asyncio
from dot_search import (
    SearchEngine, TableConfig, EmbeddingConfig,
    BM25Config, ExactConfig, SearchConfig,
)

engine = SearchEngine(db_url="postgresql+asyncpg://user:pass@localhost/mydb")

async def main():
    # --- 1. Index a table with vector search ---
    await engine.index(TableConfig(
        table="articles",
        embeddings=[
            EmbeddingConfig(key="body", source_column="body"),
        ],
    ))

    # --- 2. Search ---
    results = await engine.search("neural networks", "articles")
    for r in results:
        print(r.id, r.score)

    # --- 3. Multi-strategy index ---
    await engine.index(TableConfig(
        table="articles",
        embeddings=[
            EmbeddingConfig(key="body", source_column="body"),
            EmbeddingConfig(key="title", source_column="title"),
        ],
        bm25=[
            BM25Config(key="title_bm25", source_column="title"),
            BM25Config(key="body_bm25", source_column="body"),
        ],
        exact=[ExactConfig(key="name_exact", source_column="name")],
    ))

    # --- 4. Hybrid search with SQL filter and weight overrides ---
    results = await engine.search(
        "fermentation and gut health",
        "articles",
        SearchConfig(
            limit=10,
            where="published_year >= 2022 AND topic = 'health'",
            weights={"body": 1.0, "title": 0.3, "title_bm25": 0.5, "body_bm25": 2.0},
        ),
    )

    # --- 5. Single-strategy search ---
    results = await engine.search("fermentation", "articles", SearchConfig(strategy="bm25"))
    results = await engine.search("Dupont", "articles", SearchConfig(strategy="exact"))

    # --- 6. Multiple indexes on the same table ---
    await engine.index(TableConfig(
        table="articles",
        index_id="article_titles",
        embeddings=[EmbeddingConfig(key="title_only", source_column="title")],
    ))
    results = await engine.search("gut health", "article_titles")

asyncio.run(main())
```

## Search strategies

| Strategy | What it uses |
|----------|-------------|
| `"hybrid"` | Vector + BM25 + exact (any configured), fused via RRF (default) |
| `"vector"` | Vector similarity only |
| `"bm25"` | BM25 keyword search only |
| `"exact"` | Substring (`LIKE`) search only |

## Contributing & Development

See [docs/CONTRIBUTING.md](docs/CONTRIBUTING.md) and [docs/DEVELOPMENT.md](docs/DEVELOPMENT.md).

## License

See [LICENSE](LICENSE) for details.

## Contact

deepika Team — contact@deepika.ai
Project: [gitlab.com/deepika6190303/deepika-open-toolbox/dot-search](https://gitlab.com/deepika6190303/deepika-open-toolbox/dot-search)
