Metadata-Version: 2.4
Name: sf-vector-sdk
Version: 0.6.0
Summary: Python SDK for the Vector Gateway service (embeddings and vector search)
Requires-Python: >=3.11
Requires-Dist: redis[hiredis]>=5.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Vector SDK for Python

A lightweight Python client for submitting embedding requests and vector search queries to the Vector Gateway service.

## Overview

The Vector SDK provides a simple interface for generating embeddings via the centralized Vector Gateway service. The SDK communicates directly with Redis Streams, making it efficient and suitable for any Python service that can reach the shared Redis VM.

**Key Features:**
- Simple, Pythonic API with namespace-based organization
- Intuitive methods: `client.embeddings`, `client.search`, `client.db`
- Asynchronous request submission with optional waiting
- Full type hints and documentation
- Multiple embedding model support (Google Vertex AI and OpenAI)
- Client-side model validation before submission
- Minimal dependencies (just Redis)

## Installation

### From Source (Monorepo)

```bash
cd packages/py/vector-sdk
pip install -e .
# Or with uv
uv pip install -e .
```

### From Package Registry (when published)

```bash
pip install sf-vector-sdk
```

## Authentication

All SDK operations require a valid API key. Contact your administrator to obtain an API key.

```python
from vector_sdk import VectorClient

client = VectorClient(
    redis_url="redis://your-redis-host:6379",
    http_url="http://localhost:8080",
    api_key="vsk_v1_your_api_key_here",  # Required
)
```

**API Key Format:** `vsk_v1_{32_random_chars}`

**Unauthenticated Usage:** Some utility functions work without VectorClient or API key:

```python
from vector_sdk import compute_content_hash, extract_tool_text

# These work offline - no API key required
hash_val = compute_content_hash(
    "FlashCard",
    {"type": "BASIC", "term": "ATP", "definition": "Adenosine triphosphate"}
)
```

## Quick Start

### Basic Usage

```python
from vector_sdk import VectorClient
import os

# Create client with API key
client = VectorClient(
    redis_url="redis://your-redis-host:6379",
    http_url="http://localhost:8080",
    api_key=os.environ["VECTOR_API_KEY"],  # Required
)

# Create embeddings
result = client.embeddings.create_and_wait(
    texts=[
        {"id": "doc1", "text": "Introduction to machine learning"},
        {"id": "doc2", "text": "Deep neural networks explained"},
    ],
    content_type="topic",
)
print(f"Processed: {result.processed_count}, Failed: {result.failed_count}")

# Vector search
search_result = client.search.query_and_wait(
    query_text="What is machine learning?",
    database="turbopuffer",
    namespace="topics",
    top_k=10,
)
for match in search_result.matches:
    print(f"{match.id}: {match.score}")

# Direct database lookup (no embedding)
docs = client.db.get_by_ids(
    ids=["doc1"],
    database="turbopuffer",
    namespace="topics",
)

client.close()
```

### With Storage Configuration

```python
from vector_sdk import VectorClient, StorageConfig, MongoDBStorage, TurboPufferStorage

client = VectorClient(redis_url="redis://your-redis-host:6379")

# Create embeddings with storage configuration
result = client.embeddings.create_and_wait(
    texts=[
        {
            "id": "tool123",
            "text": "Term: Photosynthesis. Definition: The process by which plants convert sunlight into energy.",
            "document": {
                "toolId": "tool123",
                "toolCollection": "FlashCard",
                "userId": "user456",
                "contentHash": "abc123",
            }
        }
    ],
    content_type="flashcard",
    priority="high",
    storage=StorageConfig(
        mongodb=MongoDBStorage(
            database="events_new",
            collection="tool_vectors",
            embedding_field="toolEmbedding",
            upsert_key="contentHash",
        ),
        turbopuffer=TurboPufferStorage(
            namespace="tool_vectors",
            id_field="_id",
            metadata=["toolId", "toolCollection", "userId"],
        ),
    ),
    metadata={"source": "my-service"},
)

client.close()
```

### Context Manager

```python
with VectorClient(redis_url="redis://localhost:6379") as client:
    result = client.embeddings.create_and_wait(
        texts=[{"id": "doc1", "text": "Hello world"}],
        content_type="document",
    )
# Connection automatically closed
```

## API Reference

### VectorClient

The main client class providing namespaced access to all SDK functionality.

#### Constructor

**Standalone Redis:**
```python
client = VectorClient(
    redis_url="redis://localhost:6379",
    redis_password="your-password",        # Optional
    http_url="http://localhost:8080",      # Optional, required for db operations
    api_key="vsk_v1_your_api_key",         # Required
    redis_cluster_mode=False,              # Default
)
```

**Redis Cluster:**
```python
client = VectorClient(
    redis_url="node1:6379,node2:6379,node3:6379",
    redis_password="your-password",
    redis_cluster_mode=True,               # Required for cluster
    http_url="http://localhost:8080",
    api_key="vsk_v1_your_api_key",
)
```

**Parameters:**
- `redis_url` (str, required): Redis connection URL or comma-separated cluster nodes
- `http_url` (str, optional): HTTP URL for db operations
- `api_key` (str, required): API key for authentication
- `redis_password` (str, optional): Redis password
- `redis_cluster_mode` (bool, optional): Enable cluster mode (default: False)
- `environment` (str, optional): Environment prefix for Redis queue names (e.g., `"staging"`, `"production"`). When set, all stream names are prefixed to isolate environments sharing the same Redis instance. Must match `QUEUE_ENV` on the gateways.

See [REDIS-CONFIGURATION.md](../../docs/REDIS-CONFIGURATION.md) for Redis setup details.

### Namespaces

#### `client.embeddings`

Embedding generation operations.

| Method | Description |
|--------|-------------|
| `create(texts, content_type, ...)` | Submit embedding request, return request ID |
| `wait_for(request_id, timeout)` | Wait for request completion |
| `create_and_wait(texts, content_type, ...)` | Submit and wait for result |
| `get_queue_depth()` | Get current queue depth for each priority |

```python
# Async: create and wait separately
request_id = client.embeddings.create(texts, content_type)
result = client.embeddings.wait_for(request_id)

# Sync: create and wait in one call
result = client.embeddings.create_and_wait(texts, content_type)

# Check queue depth
depths = client.embeddings.get_queue_depth()
```

#### `client.search`

Vector similarity search operations.

| Method | Description |
|--------|-------------|
| `query(query_text, database, ...)` | Submit search query, return request ID |
| `wait_for(request_id, timeout)` | Wait for query completion |
| `query_and_wait(query_text, database, ...)` | Submit and wait for result |

```python
# Vector search with semantic similarity
result = client.search.query_and_wait(
    query_text="What is machine learning?",
    database="turbopuffer",
    namespace="topics",
    top_k=10,
    include_metadata=True,
)
```

**Vector Passthrough** -- generate one embedding and search multiple namespaces:

```python
# Generate embedding without storage (returns raw vectors)
embed_result = client.embeddings.create_and_wait(
    texts=[{"id": "query", "text": "What is machine learning?"}],
    content_type="query",
)
query_vector = embed_result.embeddings[0]

# Search multiple namespaces with the same vector (skips re-embedding)
topics = client.search.query_and_wait(
    query_text="What is machine learning?",
    database="turbopuffer",
    namespace="topic_vectors",
    query_vector=query_vector,
)
flashcards = client.search.query_and_wait(
    query_text="What is machine learning?",
    database="turbopuffer",
    namespace="flashcard_vectors",
    query_vector=query_vector,
)
```

#### `client.db`

Direct database operations (no embedding required). Requires `http_url`.

| Method | Description |
|--------|-------------|
| `get_by_ids(ids, database, ...)` | Lookup documents by ID |
| `find_by_metadata(filters, database, ...)` | Search by metadata filters |
| `clone(id, source_namespace, destination_namespace)` | Clone document between namespaces |
| `delete(id, namespace)` | Delete document from namespace |

#### `client.structured_embeddings`

Type-safe embedding for known tool types (FlashCard, TestQuestion, etc.) with automatic text extraction, content hash computation, and database routing.

| Method | Description |
|--------|-------------|
| `embed_flashcard(data, metadata)` | Embed a flashcard, return request ID |
| `embed_flashcard_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_flashcard_batch(items)` | Embed batch of flashcards, return request ID |
| `embed_flashcard_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_test_question(data, metadata)` | Embed a test question, return request ID |
| `embed_test_question_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_test_question_batch(items)` | Embed batch of test questions, return request ID |
| `embed_test_question_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_spaced_test_question(data, metadata)` | Embed a spaced test question, return request ID |
| `embed_spaced_test_question_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_spaced_test_question_batch(items)` | Embed batch of spaced test questions, return request ID |
| `embed_spaced_test_question_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_audio_recap(data, metadata)` | Embed an audio recap section, return request ID |
| `embed_audio_recap_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_audio_recap_batch(items)` | Embed batch of audio recaps, return request ID |
| `embed_audio_recap_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_topic(data, metadata)` | Embed a topic (uses `TopicMetadata`), return request ID |
| `embed_topic_and_wait(data, metadata, timeout)` | Embed and wait for result (uses `TopicMetadata`) |
| `embed_topic_batch(items)` | Embed batch of topics (uses `TopicMetadata`), return request ID |
| `embed_topic_batch_and_wait(items, timeout)` | Embed batch and wait for result (uses `TopicMetadata`) |

**Metadata Types:**

- `ToolMetadata` - For tools (FlashCard, TestQuestion, etc.) - requires `tool_id`
- `TopicMetadata` - For topics only - all fields optional (`user_id`, `topic_id`)

```python
from vector_sdk import VectorClient, ToolMetadata, TopicMetadata, TestQuestionInput

client = VectorClient(redis_url="redis://localhost:6379")

# Embed a flashcard - uses ToolMetadata (tool_id required)
result = client.structured_embeddings.embed_flashcard_and_wait(
    data={"type": "BASIC", "term": "Mitochondria", "definition": "The powerhouse of the cell"},
    metadata=ToolMetadata(tool_id="tool123", user_id="user456", topic_id="topic789"),
)

# Embed a test question - uses ToolMetadata (tool_id required)
result = client.structured_embeddings.embed_test_question_and_wait(
    data=TestQuestionInput(
        question="What is the capital?",
        answers=[...],
        question_type="multiplechoice",
    ),
    metadata=ToolMetadata(tool_id="tool456"),
)

# Embed a topic - uses TopicMetadata (all fields optional)
# Note: Topic data requires an "id" field which becomes the TurboPuffer document ID
result = client.structured_embeddings.embed_topic_and_wait(
    data={"id": "topic-123", "topic": "Photosynthesis", "description": "The process by which plants convert sunlight to energy"},
    metadata=TopicMetadata(user_id="user123", topic_id="topic456"),  # No tool_id needed
)

# Batch embedding - embed multiple topics in a single request
from vector_sdk import TopicBatchItem

batch_result = client.structured_embeddings.embed_topic_batch_and_wait(
    items=[
        TopicBatchItem(data={"id": "topic-1", "topic": "Topic 1", "description": "Description 1"}, metadata=TopicMetadata(user_id="user1")),
        TopicBatchItem(data={"id": "topic-2", "topic": "Topic 2", "description": "Description 2"}, metadata=TopicMetadata(topic_id="topic2")),
        TopicBatchItem(data={"id": "topic-3", "topic": "Topic 3", "description": "Description 3"}, metadata=TopicMetadata()),  # All optional
    ],
)
```

**Database Routing:**

Set the `STRUCTURED_EMBEDDING_DATABASE_ROUTER` environment variable:

| Value | Behavior |
|-------|----------|
| `dual` | Write to both TurboPuffer AND Pinecone if both have `enabled: True` |
| `turbopuffer` | Only write to TurboPuffer |
| `pinecone` | Only write to Pinecone |
| undefined | Defaults to `turbopuffer` |

```python
# Lookup by IDs
result = client.db.get_by_ids(
    ids=["doc1", "doc2"],
    database="turbopuffer",
    namespace="topics",
)

# Find by metadata
result = client.db.find_by_metadata(
    filters={"userId": "user123"},
    database="mongodb",
    collection="vectors",
    database_name="mydb",
)

# Clone between namespaces
result = client.db.clone("doc1", "ns1", "ns2")

# Delete
result = client.db.delete("doc1", "ns1")

# Export entire namespace
export_result = client.db.get_vectors_in_namespace(
    namespace="tool_vectors",
    include_vectors=True,
)
print(f"Exported {len(export_result.documents)} documents")
```

### Types

#### Result Types

```python
@dataclass
class EmbeddingResult:
    request_id: str
    status: str  # "success", "partial", "failed"
    processed_count: int
    failed_count: int
    errors: list[EmbeddingError]
    timing: Optional[TimingBreakdown]
    completed_at: datetime

    @property
    def is_success(self) -> bool: ...
    @property
    def is_partial(self) -> bool: ...
    @property
    def is_failed(self) -> bool: ...

@dataclass
class QueryResult:
    request_id: str
    status: str  # "success", "failed"
    matches: list[VectorMatch]
    error: Optional[str]
    timing: Optional[QueryTiming]
    completed_at: datetime

@dataclass
class VectorMatch:
    id: str
    score: float  # Similarity score (0-1, higher is more similar)
    metadata: Optional[dict]
    vector: Optional[list[float]]
```

## Priority Levels

| Priority | Use Case | Description |
|----------|----------|-------------|
| `critical` | Real-time user requests | Reserved quota, processed first |
| `high` | New content embeddings | Standard processing priority |
| `normal` | Updates, re-embeddings | Default priority |
| `low` | Backfill, batch jobs | Processed when capacity available |

```python
result = client.embeddings.create_and_wait(texts, content_type="topic", priority="critical")
```

## Embedding Models

### Supported Models

| Model | Provider | Dimensions | Custom Dims |
|-------|----------|------------|-------------|
| `gemini-embedding-001` | Google | 3072 | No |
| `text-embedding-004` | Google | 768 | No |
| `text-multilingual-embedding-002` | Google | 768 | No |
| `text-embedding-3-small` | OpenAI | 1536 | Yes |
| `text-embedding-3-large` | OpenAI | 3072 | Yes |

### Using a Specific Model

```python
result = client.embeddings.create_and_wait(
    texts=[{"id": "doc1", "text": "Hello world"}],
    content_type="document",
    embedding_model="text-embedding-3-small",
    embedding_dimensions=512,  # Custom dimensions (only for models that support it)
)
```

## Content Hash

The SDK provides deterministic content hashing for learning tools.

```python
from vector_sdk import compute_content_hash, extract_tool_text

# Compute hash for a FlashCard
hash = compute_content_hash(
    "FlashCard",
    {"type": "BASIC", "term": "Mitochondria", "definition": "The powerhouse of the cell"}
)

# Extract text for embedding
text = extract_tool_text(
    "FlashCard",
    {"type": "BASIC", "term": "Mitochondria", "definition": "The powerhouse of the cell"}
)
```

## Migration from EmbeddingClient

The SDK now uses a namespace-based API with `VectorClient`. The old `EmbeddingClient` is preserved for backward compatibility.

### Method Mapping

| Old (EmbeddingClient) | New (VectorClient) |
|----------------------|-------------------|
| `submit()` | `client.embeddings.create()` |
| `wait_for_result()` | `client.embeddings.wait_for()` |
| `submit_and_wait()` | `client.embeddings.create_and_wait()` |
| `get_queue_depth()` | `client.embeddings.get_queue_depth()` |
| `query()` | `client.search.query()` |
| `wait_for_query_result()` | `client.search.wait_for()` |
| `query_and_wait()` | `client.search.query_and_wait()` |
| `lookup_by_ids()` | `client.db.get_by_ids()` |
| `search_by_metadata()` | `client.db.find_by_metadata()` |
| `clone_from_namespace()` | `client.db.clone()` |
| `delete_from_namespace()` | `client.db.delete()` |

### Migration Example

```python
# Old API (still works, emits deprecation warnings)
from vector_sdk import EmbeddingClient

client = EmbeddingClient("redis://localhost:6379")
result = client.submit_and_wait(texts, content_type)
client.close()

# New API (recommended)
from vector_sdk import VectorClient

client = VectorClient(redis_url="redis://localhost:6379")
result = client.embeddings.create_and_wait(texts, content_type)
client.close()
```

## Error Handling

```python
from vector_sdk import VectorClient, ModelValidationError

try:
    with VectorClient(redis_url="redis://localhost:6379") as client:
        result = client.embeddings.create_and_wait(
            texts=[{"id": "doc1", "text": "Hello"}],
            content_type="test",
            embedding_model="text-embedding-3-small",
            timeout=30,
        )
        
        if result.is_success:
            print("Success!")
        elif result.is_partial:
            print("Partial success. Errors:")
            for err in result.errors:
                print(f"  - {err.id}: {err.error}")

except ModelValidationError as e:
    print(f"Model validation failed: {e}")
except TimeoutError as e:
    print(f"Request timed out: {e}")
except ValueError as e:
    print(f"Invalid input: {e}")
```

## Testing Redis Connection

### Verify Connection on Startup

Always test the Redis connection after creating the client, especially in serverless environments:

```python
client = VectorClient(
    redis_url=os.environ["REDIS_URL"],
    redis_password=os.environ["REDIS_PASSWORD"],
    redis_cluster_mode=os.environ.get("REDIS_CLUSTER_MODE") == "true",
    api_key=os.environ["VECTOR_API_KEY"],
    http_url=os.environ.get("HTTP_URL"),
)

# Test connection before using
try:
    client.test_connection()
    print("✓ Connected to Redis")
except Exception as e:
    print(f"Cannot connect to Redis: {e}")
    # Common causes:
    # - Wrong Redis URL/hostname
    # - Network isolation (VPC access required)
    # - Wrong password
    # - Redis not running
    raise
```

**Why this matters:**

- Redis connections are lazy (don't connect until first command)
- Network issues won't be discovered until operations time out
- **Critical for serverless** (Vercel, Lambda) where network access may be restricted
- Provides immediate feedback if Redis is unreachable

## Best Practices

### 1. Test Connection on Startup (Recommended for Serverless)

```python
client = VectorClient(
    redis_url="redis://...",
    redis_cluster_mode=True,
    api_key="vsk_...",
)

# Test connection immediately - raises if unreachable
try:
    client.test_connection()
    print("Connected to Redis")
except Exception as e:
    print(f"Redis connection failed: {e}")
    # Handle connection failure (retry, fallback, etc.)
    raise
```

**Important for serverless environments:** Test connection on startup to fail fast if Redis is unreachable.

### 2. Use Appropriate Priority

```python
# Use appropriate priority levels
client.embeddings.create(texts, content_type="backfill", priority="low")
client.embeddings.create(texts, content_type="userRequest", priority="critical")
```

### 2. Batch Your Requests

```python
# Batch multiple texts per request for efficiency
texts = [{"id": doc.id, "text": doc.text} for doc in documents]
client.embeddings.create(texts, content_type)
```

### 3. Use Context Managers

```python
with VectorClient(redis_url="redis://...") as client:
    # Client automatically closed on exit
    pass
```

### 4. Deduplication

The gateway automatically deduplicates embedding requests using the `contentHash` metadata field. If a vector with the same `contentHash` already exists in the target namespace, the embedding generation is skipped to reduce costs.

- **Structured embeddings**: Deduplication is enabled by default for all tool types except Topics (which always re-embed since content may change for the same ID).
- **Raw embeddings**: Pass `allow_duplicates=True` to skip deduplication when needed.

```python
# Default: deduplication enabled (contentHash checked before embedding)
client.embeddings.create(texts, content_type="flashcard", storage=storage_config)

# Opt out of deduplication
client.embeddings.create(texts, content_type="topic", allow_duplicates=True)
```

The `EmbeddingResult` includes a `skipped_count` field showing how many items were deduplicated:

```python
result = client.embeddings.create_and_wait(texts, content_type="flashcard")
print(f"Processed: {result.processed_count}, Skipped: {result.skipped_count}")
```

## License

Proprietary - All rights reserved.
