Metadata-Version: 2.4
Name: vecta
Version: 0.1.7
Summary: A lightweight SDK for benchmarking RAG agents
Author-email: CtrlFPlus <support@runvecta.com>
Maintainer-email: CtrlFPlus <support@runvecta.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/ctrlfplus/vecta
Project-URL: Documentation, https://vecta.readthedocs.io
Project-URL: Repository, https://github.com/ctrlfplus/vecta.git
Project-URL: Bug Tracker, https://github.com/ctrlfplus/vecta/issues
Keywords: rag,retrieval,vector-database,benchmark,ai,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: jsonpath-ng
Requires-Dist: pydantic>=2.0
Requires-Dist: tqdm
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: requests
Requires-Dist: openai
Provides-Extra: datasets
Requires-Dist: datasets; extra == "datasets"
Provides-Extra: chroma
Requires-Dist: chromadb; extra == "chroma"
Provides-Extra: pinecone
Requires-Dist: pinecone; extra == "pinecone"
Provides-Extra: pgvector
Requires-Dist: psycopg>=3.2.0; extra == "pgvector"
Requires-Dist: pgvector; extra == "pgvector"
Provides-Extra: weaviate
Requires-Dist: weaviate-client; extra == "weaviate"
Provides-Extra: databricks
Requires-Dist: databricks-sdk; extra == "databricks"
Requires-Dist: databricks-vectorsearch; extra == "databricks"
Provides-Extra: azure
Requires-Dist: azure-cosmos; extra == "azure"
Provides-Extra: faiss
Requires-Dist: faiss-cpu; extra == "faiss"
Provides-Extra: langchain
Requires-Dist: langchain-core; extra == "langchain"
Requires-Dist: langchain-community; extra == "langchain"
Requires-Dist: langchain-chroma; extra == "langchain"
Provides-Extra: llamaindex
Requires-Dist: llama_index; extra == "llamaindex"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pandas-stubs; extra == "dev"
Requires-Dist: types-tqdm; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-reportlab; extra == "dev"
Requires-Dist: python-dotenv; extra == "dev"
Provides-Extra: all
Requires-Dist: pytest; extra == "all"
Requires-Dist: pandas-stubs; extra == "all"
Requires-Dist: types-tqdm; extra == "all"
Requires-Dist: types-requests; extra == "all"
Requires-Dist: types-reportlab; extra == "all"
Requires-Dist: python-dotenv; extra == "all"
Requires-Dist: chromadb; extra == "all"
Requires-Dist: pinecone; extra == "all"
Requires-Dist: psycopg>=3.2.0; extra == "all"
Requires-Dist: pgvector; extra == "all"
Requires-Dist: weaviate-client; extra == "all"
Requires-Dist: databricks-sdk; extra == "all"
Requires-Dist: databricks-vectorsearch; extra == "all"
Requires-Dist: azure-cosmos; extra == "all"
Requires-Dist: faiss-cpu; extra == "all"
Requires-Dist: langchain-core; extra == "all"
Requires-Dist: langchain-community; extra == "all"
Requires-Dist: langchain-chroma; extra == "all"
Requires-Dist: llama_index; extra == "all"
Dynamic: license-file

# 🔻 Vecta

# The lightest, fastest, easiest SDK for evaluating AI agents.

Vecta helps you improve (and ultimately trust) your AI agents. In particular, Vecta excels at RAG (Retrieval-Augmented Generation) agents.

In particular, Vecta excels at creating and curating high-quality synthetic benchmarks grounded on your knowledge base, which include multi-hop retrievals, edge cases, and adversarial queries.

You can also import man-made datasets, and load benchmarks from well-known public datasets.

For every query, Vecta aims to answer the following questions as fast as possible:

> "Did it get the right chunk?", "did it cite the right page?", "did it cite the right document?", "was the response accurate?", "was the responses grounded in the retrieved chunks?"

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

## Installation

`pip install vecta`

## Quickstart

Let's say you've built a RAG pipeline:

```python
def my_rag(query: str) -> tuple[list[str], str]:
    # ...
    return retrieved_chunk_ids, answer
```

You can evaluate it in just a few lines:

```python
from vecta import VectaAPIClient

client = VectaAPIClient() 

data_source = client.upload_local_files(
    file_paths=["knowledge_base.pdf", "faq.docx"],
) # you can also connect your vector db for more granular results

benchmark = client.create_benchmark(
    data_source_id=data_source["id"],
    questions_count=10,
) # you can also load custom benchmarks, including from huggingface

results = client.evaluate_retrieval_and_generation(
    benchmark_id=benchmark["id"],
    retrieval_generation_function=my_rag,
)

print(f"Retriever F1: {results.document_level.f1_score}")
print(f"Response Accuracy: {results.generation_metrics.accuracy}")
print(f"Groundedness: {results.generation_metrics.groundedness}")
```


## What types of evaluations can I measure?

Evaluations can be run at different semantic granularity, and for different components of your agentic system. 

| Semantic Level     | Retrieval         | Generation             | Retrieval + Generation                    |
| ------------------ | ----------------- | ---------------------- | ----------------------------------------- |
| **Chunk-level**    | Recall, Precision, F1 | Accuracy, Groundedness | Recall, Precision, F1, Accuracy, Groundedness |
| **Page-level**     | Recall, Precision, F1 | Accuracy, Groundedness | Recall, Precision, F1, Accuracy, Groundedness |
| **Document-level** | Recall, Precision, F1 | Accuracy, Groundedness | Recall, Precision, F1, Accuracy, Groundedness |

## Making a benchmark

A **benchmark** in Vecta is a list of `vecta.core.schemas.BenchmarkEntry` records containing:

- a synthetic **question**
- a canonical **answer**
- the set of **chunk_ids** that can answer it
- the **page_nums** and **source_paths** where those chunks live

Vecta builds this automatically from your data source by:

1. Sampling real chunks
2. Asking an LLM to generate a question that that chunk can answer
3. Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

> 🔬 **Quality check:** For every synthetic Q&A pair we generate, the SDK (and
> the hosted platform) performs a wide similarity sweep and then runs a panel of
> parallel LLM-as-a-judge calls. Any chunk that those judges deem relevant is
> automatically merged into the benchmark's ground-truth citations so your
> downstream recall/precision numbers are rock solid.

#### 1) Connect to your data source

Every vector-database connector in the SDK expects a **`VectorDBSchema`** that tells Vecta how to pull fields such as `id`, `content`, `source_path`, and `page_nums` from the raw results returned by your data source.

##### Creating a schema

Define a `VectorDBSchema` with accessor paths that match your database's field structure:

```python
from vecta import VectorDBSchema

# Example: ChromaDB with default metadata structure
schema = VectorDBSchema(
    id_accessor="id",
    content_accessor="document",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)
```

```python
# Example: Pinecone
schema = VectorDBSchema(
    id_accessor=".id",
    content_accessor="metadata.content",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)
```

```python
# Example: Custom nested JSON metadata
schema = VectorDBSchema(
    id_accessor="chunk_id",
    content_accessor="payload.document_text",
    source_path_accessor="json(metadata.provenance).doc_name",
    page_nums_accessor="json(metadata.provenance).pages",
)
```

Accessor strings support dotted paths (`"metadata.source_path"`), array indexes (`"[0]"`), property access (`".id"`), and `json()` traversal for nested JSON strings.

> 💡 **Tip:** When building a schema, log or inspect one record from your data source so you can map each field directly to a schema accessor. See the [Accessor Syntax docs](https://www.runvecta.com/docs/accessor-syntax) for the full reference.

##### Example: Vector Database

```python
import chromadb
from vecta import VectaClient, ChromaLocalConnector, VectorDBSchema

chroma = chromadb.Client()
collection_name = "my_docs"

# Define schema for your data structure
schema = VectorDBSchema(
    id_accessor="id",
    content_accessor="document",
    metadata_accessor="metadata",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

# Connect Chroma to Vecta
connector = ChromaLocalConnector(
    client=chroma,
    collection_name=collection_name,
    schema=schema,
)

# Initialize VectaClient
vecta = VectaClient(
    data_source_connector=connector,
    openai_api_key="sk-...",  # required for benchmark generation & generation metrics
)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()
```

##### Example: File Store

```python
from vecta import VectaClient, FileStoreConnector

# Define file paths to ingest
file_paths = ["document1.pdf", "document2.docx", "document3.txt"]

# FileStoreConnector does NOT require a schema — Vecta manages the chunk format internally
connector = FileStoreConnector(
    file_paths=file_paths,
    base_path="/path/to/files",
)

# Initialize VectaClient
vecta = VectaClient(
    data_source_connector=connector,
    openai_api_key="sk-...",
)

# Load the knowledge base (this will ingest files using markitdown)
vecta.load_knowledge_base()
```

> ✅ **Schema requirements:** Each **vector-database** connector requires a `VectorDBSchema` that defines how to extract `id`, `content`, `source_path` and `page_nums` from your data. **File store** connectors do not require a schema — Vecta controls the chunk format internally.

#### 2) Generate the benchmark

```python
# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    random_seed=42,
)
```

#### 3) Save / Load the benchmark (CSV)

```python
# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")
```

### Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

- **Retrieval** → you provide a function: `query: str -> chunk_ids: List[str]`
- **Generation** → you provide: `query: str -> generated_text: str`
- **Retrieval + Generation** → you provide: `query: str -> Tuple[chunk_ids: List[str], generated_text: str]`

#### Retrieval-only evaluation

Provide a function that returns the **IDs** of your retrieved chunks for a given query.

```python
from typing import List

def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")
```

#### Generation-only evaluation

```python
def my_llm_call(query: str) -> str:
    resp = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer the question."},
            {"role": "user", "content": query}
        ]
    )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="gpt-4o")
```

#### Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs **and** your generated answer.

```python
from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = connector.semantic_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    context = "\n".join([c.content for c in retrieved])
    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"Context:\n{context}"},
            {"role": "user", "content": query}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple (chunk_ids, generated_text)
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")
```

### Using the API Client

The `VectaAPIClient` connects to the hosted Vecta platform. Your evaluation function runs **locally**, and results are automatically uploaded to the dashboard.

```python
from vecta import VectaAPIClient

client = VectaAPIClient(api_key="your-vecta-api-key")
# Or set VECTA_API_KEY environment variable and call VectaAPIClient()
```

#### Creating benchmarks via the API

```python
# Generate a synthetic benchmark on the server
benchmark = client.create_benchmark(
    data_source_id="your-data-source-id",
    questions_count=100,
    random_seed=42,
)

# Download entries for local evaluation
entries = client.download_benchmark(benchmark["id"])
```

#### Running evaluations via the API

```python
# Retrieval only
results = client.evaluate_retrieval(
    benchmark_id="your-benchmark-id",
    retrieval_function=my_retriever,
    evaluation_name="baseline @ k=10",
    metadata={"top_k": 10, "model": "text-embedding-3-small"},
)

# Retrieval + Generation
results = client.evaluate_retrieval_and_generation(
    benchmark_id="your-benchmark-id",
    retrieval_generation_function=my_rag_pipeline,
    evaluation_name="rag-v1",
    metadata={"top_k": 5, "model": "gpt-4o"},
)

# Generation only
results = client.evaluate_generation_only(
    benchmark_id="your-benchmark-id",
    generation_function=my_llm_call,
    evaluation_name="gpt-4o baseline",
    metadata={"model": "gpt-4o", "temperature": 0.0},
)
```

### Experiments & Metadata

Attach arbitrary metadata to evaluation runs and group them into experiments for systematic comparison.

#### Attaching metadata to runs

Pass a `metadata` dictionary to any evaluation call. This is stored alongside the results and can be used to compare runs with different configurations.

```python
results = client.evaluate_retrieval(
    benchmark_id="...",
    retrieval_function=my_retriever,
    evaluation_name="baseline @ k=10",
    metadata={"top_k": 10, "chunk_size": 512, "model": "text-embedding-3-small"},
)
```

#### Creating experiments

Group evaluations into experiments using the `VectaAPIClient`:

```python
from vecta import VectaAPIClient

client = VectaAPIClient()

# Create an experiment
experiment = client.create_experiment(
    name="Chunk Size Sweep",
    description="Testing 256/512/1024 chunk sizes",
)

# Run evaluations within the experiment
for chunk_size in [256, 512, 1024]:
    results = client.evaluate_retrieval(
        benchmark_id="...",
        retrieval_function=make_retriever(chunk_size),
        evaluation_name=f"chunk_size_{chunk_size}",
        experiment_id=experiment["id"],
        metadata={"chunk_size": chunk_size, "model": "text-embedding-3-small"},
    )
```

#### Visualizing experiments

Use the built-in plotting module to compare evaluations across metadata values:

```python
from vecta import plot_experiment, get_metadata_keys

# Get experiment with evaluations
exp_detail = client.get_experiment(experiment["id"])
evaluations = exp_detail["evaluations"]

# See what metadata keys are available
keys = get_metadata_keys(evaluations)
# e.g. ["chunk_size", "model"]

# Plot — auto-detects string vs numeric values:
#   numeric values → line chart
#   string values  → grouped bar chart
plot_experiment(evaluations, metadata_key="chunk_size")
plot_experiment(evaluations, metadata_key="model")
```

### Connecting to custom data sources

Don't see a connector for your data source? Inherit from `vecta.connectors.base.BaseVectorDBConnector` and define three methods with a schema:

```python
from vecta.connectors.base import BaseVectorDBConnector
from vecta import ChunkData, VectorDBSchema

# Define how to extract data from your data source results
custom_schema = VectorDBSchema(
    id_accessor="id",
    content_accessor="document",
    source_path_accessor="metadata.source_path",
    page_nums_accessor="metadata.page_nums",
)

class CustomConnector(BaseVectorDBConnector):
    def __init__(self, your_db_client, schema: VectorDBSchema):
        super().__init__(schema)
        self.db = your_db_client

    def get_all_chunks(self) -> list[ChunkData]:
        results = self.db.get_all()
        return [self._create_chunk_data_from_raw(r) for r in results]

    def semantic_search(self, query_str: str, k: int = 10) -> list[ChunkData]:
        results = self.db.search(query_str, limit=k)
        return [self._create_chunk_data_from_raw(r) for r in results]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        result = self.db.get_by_id(chunk_id)
        return self._create_chunk_data_from_raw(result)
```

The inherited `_create_chunk_data_from_raw()` method uses your schema to extract fields automatically.

**Schema accessor syntax:** Use `"field"`, `"metadata.nested_field"`, `".property"`, `"[0]"` for arrays, `"json(field).subfield"` for JSON parsing, or `"json(json(field).sub).final"` for nested JSON.

### Importing existing datasets

Vecta ships with dataset importers so you can start from curated retrieval or generation benchmarks instead of generating your own from scratch.

```python
from vecta import BenchmarkDatasetImporter

importer = BenchmarkDatasetImporter()

# MS MARCO — retrieval + generation benchmark
chunks, entries = importer.import_msmarco(split="test", max_items=100)

# GPQA Diamond — generation-only benchmark (no chunks)
chunks, entries = importer.import_gpqa_diamond(split="train", max_items=60)
```

See the [Hugging Face docs](https://www.runvecta.com/docs/benchmark-huggingface) for details on each dataset.

### Available connectors

All connectors are lazy-loaded and only imported when accessed:

| Connector | Import | Requires Schema |
|---|---|---|
| `ChromaLocalConnector` | `from vecta import ChromaLocalConnector` | ✅ |
| `ChromaCloudConnector` | `from vecta import ChromaCloudConnector` | ✅ |
| `PineconeConnector` | `from vecta import PineconeConnector` | ✅ |
| `PgVectorConnector` | `from vecta import PgVectorConnector` | ✅ |
| `WeaviateConnector` | `from vecta import WeaviateConnector` | ✅ |
| `AzureCosmosConnector` | `from vecta import AzureCosmosConnector` | ✅ |
| `DatabricksConnector` | `from vecta import DatabricksConnector` | ✅ |
| `LangChainVectorStoreConnector` | `from vecta import LangChainVectorStoreConnector` | ✅ |
| `LlamaIndexConnector` | `from vecta import LlamaIndexConnector` | ✅ |
| `FileStoreConnector` | `from vecta import FileStoreConnector` | ❌ |

### Links

- [Documentation](https://www.runvecta.com/docs)
- [GitHub](https://github.com/runvecta/vecta)
