Metadata-Version: 2.4
Name: vecta
Version: 0.1.0
Summary: A lightweight SDK for benchmarking RAG agents
Author-email: Emmett <emmett@runvecta.com>
Maintainer-email: Emmett <emmett@runvecta.com>
License: MIT
Project-URL: Homepage, https://github.com/yourusername/vecta
Project-URL: Documentation, https://vecta.readthedocs.io
Project-URL: Repository, https://github.com/yourusername/vecta.git
Project-URL: Bug Tracker, https://github.com/yourusername/vecta/issues
Keywords: rag,retrieval,vector-database,benchmark,ai,llm
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: jsonpath-ng
Requires-Dist: pydantic>=2.0
Requires-Dist: tqdm
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: requests
Requires-Dist: openai
Requires-Dist: datasets
Provides-Extra: chroma
Requires-Dist: chromadb; extra == "chroma"
Provides-Extra: pinecone
Requires-Dist: pinecone; extra == "pinecone"
Provides-Extra: pgvector
Requires-Dist: psycopg>=3.2.0; extra == "pgvector"
Requires-Dist: pgvector; extra == "pgvector"
Provides-Extra: weaviate
Requires-Dist: weaviate-client; extra == "weaviate"
Provides-Extra: databricks
Requires-Dist: databricks-sdk; extra == "databricks"
Requires-Dist: databricks-vectorsearch; extra == "databricks"
Provides-Extra: azure
Requires-Dist: azure-cosmos; extra == "azure"
Provides-Extra: faiss
Requires-Dist: faiss-cpu; extra == "faiss"
Provides-Extra: langchain
Requires-Dist: langchain-core; extra == "langchain"
Requires-Dist: langchain-community; extra == "langchain"
Requires-Dist: langchain-chroma; extra == "langchain"
Provides-Extra: llamaindex
Requires-Dist: llama_index; extra == "llamaindex"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pandas-stubs; extra == "dev"
Requires-Dist: types-tqdm; extra == "dev"
Requires-Dist: types-requests; extra == "dev"
Requires-Dist: types-reportlab; extra == "dev"
Requires-Dist: python-dotenv; extra == "dev"
Provides-Extra: all
Requires-Dist: pytest; extra == "all"
Requires-Dist: pandas-stubs; extra == "all"
Requires-Dist: types-tqdm; extra == "all"
Requires-Dist: types-requests; extra == "all"
Requires-Dist: types-reportlab; extra == "all"
Requires-Dist: python-dotenv; extra == "all"
Requires-Dist: chromadb; extra == "all"
Requires-Dist: pinecone; extra == "all"
Requires-Dist: psycopg>=3.2.0; extra == "all"
Requires-Dist: pgvector; extra == "all"
Requires-Dist: weaviate-client; extra == "all"
Requires-Dist: databricks-sdk; extra == "all"
Requires-Dist: databricks-vectorsearch; extra == "all"
Requires-Dist: azure-cosmos; extra == "all"
Requires-Dist: faiss-cpu; extra == "all"
Requires-Dist: langchain-core; extra == "all"
Requires-Dist: langchain-community; extra == "all"
Requires-Dist: langchain-chroma; extra == "all"
Requires-Dist: llama_index; extra == "all"
Dynamic: license-file

# 🔻 Vecta

# A lightweight SDK for benchmarking RAG agents.

Vecta helps you improve (and ultimately trust) your RAG (Retrieval-Augmented Generation) agents. Easily evaluate your system against human-made or synthetic benchmarks, grounded on your knowledge base.

The benchmarks are built on the concept of "Full test coverage". Synthetic benchmarks generated by Vecta include multi-hop retrievals, edge cases, and adversarial queries.

Evaluations are done across the chunk, page, and document levels, and can be run on each individual part of the pipeline: retrieval-only, generation-only, or retrieval augmented generation (full RAG pipeline).

## What types of evaluations can I measure?

Evaluations can be run at different semantic levels and for different components of your agentic system.

| Semantic Level     | Retrieval         | Generation           | Retrieval, Generation                   |
| ------------------ | ----------------- | -------------------- | --------------------------------------- |
| **Chunk-level**    | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
| **Page-level**     | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |
| **Document-level** | Recall, Precision | Accuracy, Factuality | Recall, Precision, Accuracy, Factuality |

## Making a benchmark

A **benchmark** in Vecta is a list of `vecta.core.schema.BenchmarkEntry` records containing:

- a synthetic **question**
- a canonical **answer**
- the set of **chunk_ids** that can answer it
- the **page_nums** and **doc_names** where those chunks live

Vecta builds this automatically from your knowledge base by:

1. Sampling real chunks
2. Asking an LLM to generate a question that that chunk can answer
3. Discovering other chunks that could also answer it (via semantic search + an LLM-as-a-judge check).

#### 1) Connect to your vector DB and load the KB

```python
from chromadb import Client
from vecta.connectors.chroma_connector import ChromaConnector
from vecta.core.benchmark import VectaClient

chroma = Client()
collection_name = "my_docs"

# Connect Chroma to Vecta
connector = ChromaConnector(client=chroma, collection_name=collection_name)

# Initialize VectaClient
vecta = VectaClient(
    vector_db_connector=connector, openai_api_key="<YOUR_OPENROUTER_API_KEY>"
)

# Load the knowledge base into Vecta
vecta.load_knowledge_base()
```

> ✅ **Metadata requirements:** each chunk in your vector database must contain a Metadata dictionary containing the keys `page_nums: List[int]` and `doc_name: str`.

#### 2) Generate the benchmark

```python
# Create N synthetic Q&A pairs and align them to correct chunks/pages/docs
entries = vecta.generate_benchmark(
    n_questions=10,
    similarity_threshold=0.7,
    similarity_top_k=5,
    random_seed=42,
)
```

#### 3) Save / Load the benchmark (CSV)

```python
# Save to CSV
vecta.save_benchmark("my_benchmark.csv")

# Later (or in another script), load it back:
vecta.load_benchmark("my_benchmark.csv")
```

### Running an evaluation

Vecta lets you evaluate three things against an existing benchmark:

- **Retrieval** → you provide a function: `query: str -> chunk_ids: List[str]`
- **Generation** → you provide: `query: str -> generated_text: str`
- **Retrieval + Generation** → you provide: `query: str -> Tuple[chunk_ids: List[str], generated_text: str]`

#### Retrieval-only evaluation

Provide a function that returns the **IDs** of your retrieved chunks for a given query.

```python
from typing import List


def my_retriever(query: str) -> List[str]:
    top = connector.semantic_search(query_str=query, k=10)

    # return chunk ids
    return [c.id for c in top]

retrieval_results = vecta.evaluate_retrieval(my_retriever, evaluation_name="baseline @ k=10")
```

#### Generation-only evaluation

```python
def my_llm_call(query: str) -> str:
    resp = self._client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": query}
            ]
        )
    return resp.choices[0].message.content

gen_only = vecta.evaluate_generation_only(my_llm_call, evaluation_name="my llm call")
```

#### Retrieval-augmented Generation (RAG) evaluation

Provide a function that returns both retrieved chunk IDs **and** your generated answer.

```python
from typing import List, Tuple

def my_rag_pipeline(query: str) -> Tuple[List[str], str]:
    # retrieve
    retrieved = vector_search(query_str=query, k=5)
    chunk_ids = [c.id for c in retrieved]

    # generate
    completion = client.chat.completions.create(
        model="your-model",
        messages=[
            {"role": "user", "content": f"{retrieved}\n{query}"}
        ]
    )
    llm_response = completion.choices[0].message.content

    # must return tuple
    return chunk_ids, llm_response

rag_results = vecta.evaluate_retrieval_and_generation(my_rag_pipeline, evaluation_name="rag @ k=5")
```

### Connecting to custom retrievers

Don't see a connector for your vector db? No problem!
Inherit from `vecta.connectors.base.BaseVectorDBConnector` and correctly define these three functions to connect Vecta to your data retrieval pipeline:

```python
from vecta.connectors.base import BaseVectorDBConnector
from vecta.core.schemas import ChunkData

class CustomConnector(BaseVectorDBConnector):
    def get_all_chunks_and_metadata(self) -> List[ChunkData]:
        return [...]

    def semantic_search(self, query: str, k: int) -> List[ChunkData]:
        return [...]

    def get_chunk_by_id(self, chunk_id: str) -> ChunkData:
        return ...
```

To implement these functions, you should first familiarize yourself with the `vecta.core.schemas.ChunkData` class.
Every chunk returned to Vecta must include:

- `id: str` , a unique identifier for the chunk
- `content: str` , the text of the chunk
- `metadata: Dict[str, Any]` , must include:
  - `page_nums`: one-indexed page numbers spanned by this chunk
  - `doc_name`: a unique file name within your corpus
