Metadata-Version: 2.4
Name: vecta
Version: 0.1.4
Summary: A lightweight SDK for benchmarking RAG agents
Author-email: Emmett <emmett@runvecta.com>
Maintainer-email: Emmett <emmett@runvecta.com>
License: MIT
Project-URL: Homepage, https://github.com/ctrlfplus/vecta
Project-URL: Documentation, https://vecta.readthedocs.io
Project-URL: Repository, https://github.com/ctrlfplus/vecta.git
Project-URL: Bug Tracker, https://github.com/ctrlfplus/vecta/issues
Keywords: rag,retrieval,vector-database,benchmark,ai,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: jsonpath-ng
Requires-Dist: pydantic>=2.0
Requires-Dist: tqdm
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: requests
Requires-Dist: openai
Requires-Dist: datasets
Provides-Extra: chroma
Requires-Dist: chromadb; extra == "chroma"
Provides-Extra: pinecone
Requires-Dist: pinecone; extra == "pinecone"
Provides-Extra: pgvector
Requires-Dist: psycopg>=3.2.0; extra == "pgvector"
Requires-Dist: pgvector; extra == "pgvector"
Provides-Extra: weaviate
Requires-Dist: weaviate-client; extra == "weaviate"
Provides-Extra: databricks
Requires-Dist: databricks-sdk; extra == "databricks"
Requires-Dist: databricks-vectorsearch; extra == "databricks"
Provides-Extra: azure
Requires-Dist: azure-cosmos; extra == "azure"
Provides-Extra: faiss
Requires-Dist: faiss-cpu; extra == "faiss"
Provides-Extra: langchain
Requires-Dist: langchain-core; extra == "langchain"
Requires-Dist: langchain-community; extra == "langchain"
Requires-Dist: langchain-chroma; extra == "langchain"
Provides-Extra: llamaindex
Requires-Dist: llama_index; extra == "llamaindex"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pandas-stubs; extra == "dev"
Requires-Dist: types-tqdm; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-reportlab; extra == "dev"
Requires-Dist: python-dotenv; extra == "dev"
Provides-Extra: all
Requires-Dist: pytest; extra == "all"
Requires-Dist: pandas-stubs; extra == "all"
Requires-Dist: types-tqdm; extra == "all"
Requires-Dist: types-requests; extra == "all"
Requires-Dist: types-reportlab; extra == "all"
Requires-Dist: python-dotenv; extra == "all"
Requires-Dist: chromadb; extra == "all"
Requires-Dist: pinecone; extra == "all"
Requires-Dist: psycopg>=3.2.0; extra == "all"
Requires-Dist: pgvector; extra == "all"
Requires-Dist: weaviate-client; extra == "all"
Requires-Dist: databricks-sdk; extra == "all"
Requires-Dist: databricks-vectorsearch; extra == "all"
Requires-Dist: azure-cosmos; extra == "all"
Requires-Dist: faiss-cpu; extra == "all"
Requires-Dist: langchain-core; extra == "all"
Requires-Dist: langchain-community; extra == "all"
Requires-Dist: langchain-chroma; extra == "all"
Requires-Dist: llama_index; extra == "all"
Dynamic: license-file

# 🔻 Vecta

# A lightweight SDK for benchmarking RAG agents.

Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base. You can also bootstrap evaluations from well-known public datasets without having to write custom import scripts.

The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

## What types of evaluations can I measure?

Evaluations can be run at different semantic levels and for different components of your agentic system.

| Semantic Level     | Retrieval         | Generation             | Retrieval, Generation                     |
| ------------------ | ----------------- | ---------------------- | ----------------------------------------- |
| **Chunk-level**    | Recall, Precision | Accuracy, Groundedness | Recall, Precision, Accuracy, Groundedness |
| **Page-level**     | Recall, Precision | Accuracy, Groundedness | Recall, Precision, Accuracy, Groundedness |
| **Document-level** | Recall, Precision | Accuracy, Groundedness | Recall, Precision, Accuracy, Groundedness |

## Making a benchmark

A **benchmark** in Vecta is a list of `vecta.core.schema.BenchmarkEntry` records containing:

- a synthetic **question**
- a canonical **answer**
- the set of **chunk_ids** that can answer it
- the **page_nums** and **doc_names** where those chunks live

Vecta builds this automatically from your data source by:

1. Sampling real chunks
2. Asking an LLM to generate a question that that chunk can answer
3. Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

> 🔬 **Quality check:** For every synthetic Q&A pair we generate, the SDK (and
> the hosted platform) performs a wide similarity sweep and then runs a panel of
> parallel LLM-as-a-judge calls. Any chunk that those judges deem relevant is
> automatically merged into the benchmark's ground-truth citations so your
> downstream recall/precision numbers are rock solid.

#### 1) Connect to your data source

Every connector in the SDK expects a **schema** that tells Vecta how to pull fields such as `id`, `content`, `source_path`, and `page_nums` from the raw results returned by your data source.

##### Choosing a schema template

- Use the helpers in `vecta.core.schema_helpers.SchemaTemplates` for popular databases (Chroma, Weaviate, Pinecone, LanceDB, etc.).
- Pass the schema instance to your connector. Each helper documents the required metadata fields so you can match them with the way your data is stored.

```python
from vecta.core.schema_helpers import SchemaTemplates

# Example: for Chroma collections with the default metadata structure
schema = SchemaTemplates.chroma_default()
```

##### Creating a custom schema

Define a `VectorDBSchema` manually when your database returns non-standard field names or nested metadata:

```python
from vecta.core.schemas import VectorDBSchema

custom_schema = VectorDBSchema(
    id_accessor="chunk_id",
    content_accessor="payload.document_text",
    source_path_accessor="metadata.source",
    page_nums_accessor="json(metadata.provenance).pages",
)
```

Accessor strings support dotted paths, array indexes (e.g., `"chunks[0].id"`), and `json()` traversal for nested JSON structures.

> 💡 **Tip:** When building a schema, log or inspect one record from your data source so you can map each field directly to a schema accessor.

##### Example: Vector Database

```python
from chromadb import Client
from vecta.connectors.chroma_local_connector import ChromaLocalConnector
from vecta.core.benchmark import VectaClient
from vecta.core.schema_helpers import SchemaTemplates

chroma = Client()
collection_name = "my_docs"

# Define schema for your data structure
schema = SchemaTemplates.chroma_default()

# Connect Chroma to Vecta
connector = ChromaLocalConnector(
    client=chroma,
    collection_name=collection_name,
    schema=schema
)

# Initialize VectaClient
vecta = VectaClient(vector_db_connector=connector)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()
```

##### Example: File Store

```python
from vecta.connectors.file_store_connector import FileStoreConnector
from vecta.core.benchmark import VectaClient
from vecta.core.schema_helpers import SchemaTemplates

# Define file paths to ingest
file_paths = ["document1.pdf", "document2.docx", "document3.txt"]

connector = FileStoreConnector(
    file_paths=file_paths,
    schema=SchemaTemplates.chroma_default(),
    base_path="/path/to/files"
)

# Initialize VectaClient
vecta = VectaClient(vector_db_connector=connector)

# Load the knowledge base (this will ingest files using thepipe)
vecta.load_knowledge_base()
```

> ✅ **Schema requirements:** Each connector requires a schema that defines how to extract `id`, `content`, `source_path` and `page_nums` from your data. Use our schema helpers or create custom ones with syntax like `"metadata.source_path"` or `"json(metadata.provenance).doc_name"`.

#### 2) Generate the benchmark

```python
# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    similarity_threshold=0.7,
    similarity_top_k=5,
    random_seed=42,
)
```

#### 3) Save / Load the benchmark (CSV)

```python
# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")
```

### Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

- **Retrieval** → you provide a function: `query: str -> chunk_ids: List[str]`
- **Generation** → you provide: `query: str -> generated_text: str`
- **Retrieval + Generation** → you provide: `query: str -> Tuple[chunk_ids: List[str], generated_text: str]`

#### Retrieval-only evaluation

Provide a function that returns the **IDs** of your retrieved chunks for a given query.

```python
from typing import List

def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)
    # return chunk ids
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")
```

#### Generation-only evaluation

```python
def my_llm_call(query: str) -> str:
    resp = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ]
        )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")
```

#### Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs **and** your generated answer.

````python
from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = vector_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    completion = client.chat.completions.create(
        model="your-model",
        messages=[
            {"role": "user", "content": f"{retrieved}\n{query}"}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple
    ```markdown
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")
````

### Connecting to custom data sources

Don't see a connector for your data source? No problem!
Inherit from `vecta.connectors.base.BaseVectorDBConnector` and define these three functions with a schema:

```python
from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData, VectorDBSchema

# Define how to extract data from your data source results
custom_schema = VectorDBSchema(
    id_accessor="id",  # Direct field access
    content_accessor="document",  # Field containing text
    source_path_accessor="metadata.source_path",  # Nested field access
    page_nums_accessor="json(metadata.provenance).page_nums",  # JSON parsing
)

class CustomConnector(BaseVectorDBConnector):
    def __init__(self, your_db_client, schema: VectorDBSchema):
        super().__init__(schema)
        self.db = your_db_client

    def get_all_chunks(self) -> List[ChunkData]:
        results = self.db.get_all()
        return [self._create_chunk_data_from_raw(r) for r in results]

    def semantic_search(self, query: str, k: int) -> List[ChunkData]:
        results = self.db.search(query, limit=k)
        return [self._create_chunk_data_from_raw(r) for r in results]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        result = self.db.get_by_id(chunk_id)
        return self._create_chunk_data_from_raw(result)
```

**Schema accessor syntax:** Use `"field"`, `"metadata.nested_field"`, `"[0]"` for arrays, `"json(field).subfield"` for JSON parsing, or `"json(json(field).sub).final"` for nested JSON.

### Importing existing datasets

Vecta ships with dataset importers so you can start from curated retrieval or generation benchmarks instead of generating your own from scratch. Import popular evaluation datasets like GPQA Diamond or MS MARCO:
