Metadata-Version: 2.4
Name: vecta
Version: 0.1.1
Summary: A lightweight SDK for benchmarking RAG agents
Author-email: Emmett <emmett@runvecta.com>
Maintainer-email: Emmett <emmett@runvecta.com>
License: MIT
Project-URL: Homepage, https://github.com/ctrlfplus/vecta
Project-URL: Documentation, https://vecta.readthedocs.io
Project-URL: Repository, https://github.com/ctrlfplus/vecta.git
Project-URL: Bug Tracker, https://github.com/ctrlfplus/vecta/issues
Keywords: rag,retrieval,vector-database,benchmark,ai,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: jsonpath-ng
Requires-Dist: pydantic>=2.0
Requires-Dist: tqdm
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: requests
Requires-Dist: openai
Requires-Dist: datasets
Provides-Extra: chroma
Requires-Dist: chromadb; extra == "chroma"
Provides-Extra: pinecone
Requires-Dist: pinecone; extra == "pinecone"
Provides-Extra: pgvector
Requires-Dist: psycopg>=3.2.0; extra == "pgvector"
Requires-Dist: pgvector; extra == "pgvector"
Provides-Extra: weaviate
Requires-Dist: weaviate-client; extra == "weaviate"
Provides-Extra: databricks
Requires-Dist: databricks-sdk; extra == "databricks"
Requires-Dist: databricks-vectorsearch; extra == "databricks"
Provides-Extra: azure
Requires-Dist: azure-cosmos; extra == "azure"
Provides-Extra: faiss
Requires-Dist: faiss-cpu; extra == "faiss"
Provides-Extra: langchain
Requires-Dist: langchain-core; extra == "langchain"
Requires-Dist: langchain-community; extra == "langchain"
Requires-Dist: langchain-chroma; extra == "langchain"
Provides-Extra: llamaindex
Requires-Dist: llama_index; extra == "llamaindex"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pandas-stubs; extra == "dev"
Requires-Dist: types-tqdm; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-reportlab; extra == "dev"
Requires-Dist: python-dotenv; extra == "dev"
Provides-Extra: all
Requires-Dist: pytest; extra == "all"
Requires-Dist: pandas-stubs; extra == "all"
Requires-Dist: types-tqdm; extra == "all"
Requires-Dist: types-requests; extra == "all"
Requires-Dist: types-reportlab; extra == "all"
Requires-Dist: python-dotenv; extra == "all"
Requires-Dist: chromadb; extra == "all"
Requires-Dist: pinecone; extra == "all"
Requires-Dist: psycopg>=3.2.0; extra == "all"
Requires-Dist: pgvector; extra == "all"
Requires-Dist: weaviate-client; extra == "all"
Requires-Dist: databricks-sdk; extra == "all"
Requires-Dist: databricks-vectorsearch; extra == "all"
Requires-Dist: azure-cosmos; extra == "all"
Requires-Dist: faiss-cpu; extra == "all"
Requires-Dist: langchain-core; extra == "all"
Requires-Dist: langchain-community; extra == "all"
Requires-Dist: langchain-chroma; extra == "all"
Requires-Dist: llama_index; extra == "all"
Dynamic: license-file

# 🔻 Vecta

# A lightweight SDK for benchmarking RAG agents.

Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base.

The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

## What types of evaluations can I measure?

Evaluations can be run at different semantic levels and for different components of your agentic system.

| Semantic Level     | Retrieval         | Generation           | Retrieval, Generation                   |
| ------------------ | ----------------- | -------------------- | --------------------------------------- |
| **Chunk-level**    | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
| **Page-level**     | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
| **Document-level** | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |

## Making a benchmark

A **benchmark** in Vecta is a list of `vecta.core.schema.BenchmarkEntry` records containing:

- a synthetic **question**
- a canonical **answer**
- the set of **chunk_ids** that can answer it
- the **page_nums** and **doc_names** where those chunks live

Vecta builds this automatically from your knowledge base by:

1. Sampling real chunks
2. Asking an LLM to generate a question that that chunk can answer
3. Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

#### 1) Connect to your vector DB and load the KB

```python
from chromadb import Client
from vecta.connectors.chroma_local_connector import ChromaLocalConnector
from vecta.core.benchmark import VectaClient
from vecta.core.schema_helpers import SchemaTemplates

chroma = Client()
collection_name = "my_docs"

# Define schema for your data structure
schema = SchemaTemplates.chroma_default()

# Connect Chroma to Vecta
connector = ChromaLocalConnector(
    client=chroma,
    collection_name=collection_name,
    schema=schema
)

# Initialize VectaClient
vecta = VectaClient(
    vector_db_connector=connector,
    openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()
```

> ✅ **Schema requirements:** Each connector requires a schema that defines how to extract `id`, `content`, `source_path` and `page_nums` from your data. Use our schema helpers or create custom ones with syntax like `"metadata.source_path"` or `"json(metadata.provenance).doc_name"`.

#### 2) Generate the benchmark

```python
# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    similarity_threshold=0.7,
    similarity_top_k=5,
    random_seed=42,
)
```

#### 3) Save / Load the benchmark (CSV)

```python
# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")
```

### Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

- **Retrieval** → you provide a function: `query: str -> chunk_ids: List[str]`
- **Generation** → you provide: `query: str -> generated_text: str`
- **Retrieval + Generation** → you provide: `query: str -> Tuple[chunk_ids: List[str], generated_text: str]`

#### Retrieval-only evaluation

Provide a function that returns the **IDs** of your retrieved chunks for a given query.

```python
from typing import List

def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)
    # return chunk ids
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")
```

#### Generation-only evaluation

```python
def my_llm_call(query: str) -> str:
    resp = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ]
        )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")
```

#### Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs **and** your generated answer.

```python
from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = vector_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    completion = client.chat.completions.create(
        model="your-model",
        messages=[
            {"role": "user", "content": f"{retrieved}\n{query}"}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")
```

### Connecting to custom databases

Don't see a connector for your vector db? No problem!
Inherit from `vecta.connectors.base.BaseVectorDBConnector` and define these three functions with a schema:

```python
from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData, VectorDBSchema

# Define how to extract data from your database results
custom_schema = VectorDBSchema(
    id_accessor="id",  # Direct field access
    content_accessor="document",  # Field containing text
    source_path_accessor="metadata.source_path",  # Nested field access
    page_nums_accessor="json(metadata.provenance).page_nums",  # JSON parsing
)

class CustomConnector(BaseVectorDBConnector):
    def __init__(self, your_db_client, schema: VectorDBSchema):
        super().__init__(schema)
        self.db = your_db_client

    def get_all_chunks(self) -> List[ChunkData]:
        results = self.db.get_all()
        return [self._create_chunk_data_from_raw(r) for r in results]

    def semantic_search(self, query: str, k: int) -> List[ChunkData]:
        results = self.db.search(query, limit=k)
        return [self._create_chunk_data_from_raw(r) for r in results]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        result = self.db.get_by_id(chunk_id)
        return self._create_chunk_data_from_raw(result)
```

**Schema accessor syntax:** Use `"field"`, `"metadata.nested_field"`, `"[0]"` for arrays, `"json(field).subfield"` for JSON parsing, or `"json(json(field).sub).final"` for nested JSON.

### Importing existing datasets

Import popular evaluation datasets like GPQA Diamond or MS MARCO:

```python
from vecta.core.dataset_importer import BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()

# Import GPQA Diamond for generation-only evaluation
chunks, benchmark_entries = importer.import_gpqa_diamond(split="train", max_items=50)

# Import MS MARCO for retrieval + generation evaluation
chunks, benchmark_entries = importer.import_msmarco(split="test", max_items=100)

# Use with VectaClient
vecta = VectaClient()
vecta.benchmark_entries = benchmark_entries
```

The importer handles dataset schema mapping automatically, converting various field structures into Vecta's standardized format.
