Metadata-Version: 2.4
Name: sf-vector-sdk
Version: 0.2.4
Summary: Python SDK for the Vector Gateway service (embeddings and vector search)
Requires-Python: >=3.11
Requires-Dist: redis>=5.0.0
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# Vector SDK for Python

A lightweight Python client for submitting embedding requests and vector search queries to the Vector Gateway service.

## Overview

The Vector SDK provides a simple interface for generating embeddings via the centralized Vector Gateway service. The SDK communicates directly with Redis Streams, making it efficient and suitable for any Python service that can reach the shared Redis VM.

**Key Features:**
- Simple, Pythonic API with namespace-based organization
- Intuitive methods: `client.embeddings`, `client.search`, `client.db`
- Asynchronous request submission with optional waiting
- Full type hints and documentation
- Multiple embedding model support (Google Vertex AI and OpenAI)
- Client-side model validation before submission
- Minimal dependencies (just Redis)

## Installation

### From Source (Monorepo)

```bash
cd packages/py/vector-sdk
pip install -e .
# Or with uv
uv pip install -e .
```

### From Package Registry (when published)

```bash
pip install sf-vector-sdk
```

## Quick Start

### Basic Usage

```python
from vector_sdk import VectorClient

# Create client
client = VectorClient(
    redis_url="redis://your-redis-host:6379",
    http_url="http://localhost:8080",  # Required for db operations
)

# Create embeddings
result = client.embeddings.create_and_wait(
    texts=[
        {"id": "doc1", "text": "Introduction to machine learning"},
        {"id": "doc2", "text": "Deep neural networks explained"},
    ],
    content_type="topic",
)
print(f"Processed: {result.processed_count}, Failed: {result.failed_count}")

# Vector search
search_result = client.search.query_and_wait(
    query_text="What is machine learning?",
    database="turbopuffer",
    namespace="topics",
    top_k=10,
)
for match in search_result.matches:
    print(f"{match.id}: {match.score}")

# Direct database lookup (no embedding)
docs = client.db.get_by_ids(
    ids=["doc1"],
    database="turbopuffer",
    namespace="topics",
)

client.close()
```

### With Storage Configuration

```python
from vector_sdk import VectorClient, StorageConfig, MongoDBStorage, TurboPufferStorage

client = VectorClient(redis_url="redis://your-redis-host:6379")

# Create embeddings with storage configuration
result = client.embeddings.create_and_wait(
    texts=[
        {
            "id": "tool123",
            "text": "Term: Photosynthesis. Definition: The process by which plants convert sunlight into energy.",
            "document": {
                "toolId": "tool123",
                "toolCollection": "FlashCard",
                "userId": "user456",
                "contentHash": "abc123",
            }
        }
    ],
    content_type="flashcard",
    priority="high",
    storage=StorageConfig(
        mongodb=MongoDBStorage(
            database="events_new",
            collection="tool_vectors",
            embedding_field="toolEmbedding",
            upsert_key="contentHash",
        ),
        turbopuffer=TurboPufferStorage(
            namespace="tool_vectors",
            id_field="_id",
            metadata=["toolId", "toolCollection", "userId"],
        ),
    ),
    metadata={"source": "my-service"},
)

client.close()
```

### Context Manager

```python
with VectorClient(redis_url="redis://localhost:6379") as client:
    result = client.embeddings.create_and_wait(
        texts=[{"id": "doc1", "text": "Hello world"}],
        content_type="document",
    )
# Connection automatically closed
```

## API Reference

### VectorClient

The main client class providing namespaced access to all SDK functionality.

#### Constructor

```python
client = VectorClient(
    redis_url="redis://localhost:6379",
    http_url="http://localhost:8080",  # Optional, required for db operations
)
```

### Namespaces

#### `client.embeddings`

Embedding generation operations.

| Method | Description |
|--------|-------------|
| `create(texts, content_type, ...)` | Submit embedding request, return request ID |
| `wait_for(request_id, timeout)` | Wait for request completion |
| `create_and_wait(texts, content_type, ...)` | Submit and wait for result |
| `get_queue_depth()` | Get current queue depth for each priority |

```python
# Async: create and wait separately
request_id = client.embeddings.create(texts, content_type)
result = client.embeddings.wait_for(request_id)

# Sync: create and wait in one call
result = client.embeddings.create_and_wait(texts, content_type)

# Check queue depth
depths = client.embeddings.get_queue_depth()
```

#### `client.search`

Vector similarity search operations.

| Method | Description |
|--------|-------------|
| `query(query_text, database, ...)` | Submit search query, return request ID |
| `wait_for(request_id, timeout)` | Wait for query completion |
| `query_and_wait(query_text, database, ...)` | Submit and wait for result |

```python
# Vector search with semantic similarity
result = client.search.query_and_wait(
    query_text="What is machine learning?",
    database="turbopuffer",
    namespace="topics",
    top_k=10,
    include_metadata=True,
)
```

#### `client.db`

Direct database operations (no embedding required). Requires `http_url`.

| Method | Description |
|--------|-------------|
| `get_by_ids(ids, database, ...)` | Lookup documents by ID |
| `find_by_metadata(filters, database, ...)` | Search by metadata filters |
| `clone(id, source_namespace, destination_namespace)` | Clone document between namespaces |
| `delete(id, namespace)` | Delete document from namespace |

#### `client.structured_embeddings`

Type-safe embedding for known tool types (FlashCard, TestQuestion, etc.) with automatic text extraction, content hash computation, and database routing.

| Method | Description |
|--------|-------------|
| `embed_flashcard(data, metadata)` | Embed a flashcard, return request ID |
| `embed_flashcard_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_flashcard_batch(items)` | Embed batch of flashcards, return request ID |
| `embed_flashcard_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_test_question(data, metadata)` | Embed a test question, return request ID |
| `embed_test_question_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_test_question_batch(items)` | Embed batch of test questions, return request ID |
| `embed_test_question_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_spaced_test_question(data, metadata)` | Embed a spaced test question, return request ID |
| `embed_spaced_test_question_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_spaced_test_question_batch(items)` | Embed batch of spaced test questions, return request ID |
| `embed_spaced_test_question_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_audio_recap(data, metadata)` | Embed an audio recap section, return request ID |
| `embed_audio_recap_and_wait(data, metadata, timeout)` | Embed and wait for result |
| `embed_audio_recap_batch(items)` | Embed batch of audio recaps, return request ID |
| `embed_audio_recap_batch_and_wait(items, timeout)` | Embed batch and wait for result |
| `embed_topic(data, metadata)` | Embed a topic (uses `TopicMetadata`), return request ID |
| `embed_topic_and_wait(data, metadata, timeout)` | Embed and wait for result (uses `TopicMetadata`) |
| `embed_topic_batch(items)` | Embed batch of topics (uses `TopicMetadata`), return request ID |
| `embed_topic_batch_and_wait(items, timeout)` | Embed batch and wait for result (uses `TopicMetadata`) |

**Metadata Types:**

- `ToolMetadata` - For tools (FlashCard, TestQuestion, etc.) - requires `tool_id`
- `TopicMetadata` - For topics only - all fields optional (`user_id`, `topic_id`)

```python
from vector_sdk import VectorClient, ToolMetadata, TopicMetadata, TestQuestionInput

client = VectorClient(redis_url="redis://localhost:6379")

# Embed a flashcard - uses ToolMetadata (tool_id required)
result = client.structured_embeddings.embed_flashcard_and_wait(
    data={"type": "BASIC", "term": "Mitochondria", "definition": "The powerhouse of the cell"},
    metadata=ToolMetadata(tool_id="tool123", user_id="user456", topic_id="topic789"),
)

# Embed a test question - uses ToolMetadata (tool_id required)
result = client.structured_embeddings.embed_test_question_and_wait(
    data=TestQuestionInput(
        question="What is the capital?",
        answers=[...],
        question_type="multiplechoice",
    ),
    metadata=ToolMetadata(tool_id="tool456"),
)

# Embed a topic - uses TopicMetadata (all fields optional)
# Note: Topic data requires an "id" field which becomes the TurboPuffer document ID
result = client.structured_embeddings.embed_topic_and_wait(
    data={"id": "topic-123", "topic": "Photosynthesis", "description": "The process by which plants convert sunlight to energy"},
    metadata=TopicMetadata(user_id="user123", topic_id="topic456"),  # No tool_id needed
)

# Batch embedding - embed multiple topics in a single request
from vector_sdk import TopicBatchItem

batch_result = client.structured_embeddings.embed_topic_batch_and_wait(
    items=[
        TopicBatchItem(data={"id": "topic-1", "topic": "Topic 1", "description": "Description 1"}, metadata=TopicMetadata(user_id="user1")),
        TopicBatchItem(data={"id": "topic-2", "topic": "Topic 2", "description": "Description 2"}, metadata=TopicMetadata(topic_id="topic2")),
        TopicBatchItem(data={"id": "topic-3", "topic": "Topic 3", "description": "Description 3"}, metadata=TopicMetadata()),  # All optional
    ],
)
```

**Database Routing:**

Set the `STRUCTURED_EMBEDDING_DATABASE_ROUTER` environment variable:

| Value | Behavior |
|-------|----------|
| `dual` | Write to both TurboPuffer AND Pinecone if both have `enabled: True` |
| `turbopuffer` | Only write to TurboPuffer |
| `pinecone` | Only write to Pinecone |
| undefined | Defaults to `turbopuffer` |

```python
# Lookup by IDs
result = client.db.get_by_ids(
    ids=["doc1", "doc2"],
    database="turbopuffer",
    namespace="topics",
)

# Find by metadata
result = client.db.find_by_metadata(
    filters={"userId": "user123"},
    database="mongodb",
    collection="vectors",
    database_name="mydb",
)

# Clone between namespaces
result = client.db.clone("doc1", "ns1", "ns2")

# Delete
result = client.db.delete("doc1", "ns1")
```

### Types

#### Result Types

```python
@dataclass
class EmbeddingResult:
    request_id: str
    status: str  # "success", "partial", "failed"
    processed_count: int
    failed_count: int
    errors: list[EmbeddingError]
    timing: Optional[TimingBreakdown]
    completed_at: datetime

    @property
    def is_success(self) -> bool: ...
    @property
    def is_partial(self) -> bool: ...
    @property
    def is_failed(self) -> bool: ...

@dataclass
class QueryResult:
    request_id: str
    status: str  # "success", "failed"
    matches: list[VectorMatch]
    error: Optional[str]
    timing: Optional[QueryTiming]
    completed_at: datetime

@dataclass
class VectorMatch:
    id: str
    score: float  # Similarity score (0-1, higher is more similar)
    metadata: Optional[dict]
    vector: Optional[list[float]]
```

## Priority Levels

| Priority | Use Case | Description |
|----------|----------|-------------|
| `critical` | Real-time user requests | Reserved quota, processed first |
| `high` | New content embeddings | Standard processing priority |
| `normal` | Updates, re-embeddings | Default priority |
| `low` | Backfill, batch jobs | Processed when capacity available |

```python
result = client.embeddings.create_and_wait(texts, content_type="topic", priority="critical")
```

## Embedding Models

### Supported Models

| Model | Provider | Dimensions | Custom Dims |
|-------|----------|------------|-------------|
| `gemini-embedding-001` | Google | 3072 | No |
| `text-embedding-004` | Google | 768 | No |
| `text-multilingual-embedding-002` | Google | 768 | No |
| `text-embedding-3-small` | OpenAI | 1536 | Yes |
| `text-embedding-3-large` | OpenAI | 3072 | Yes |

### Using a Specific Model

```python
result = client.embeddings.create_and_wait(
    texts=[{"id": "doc1", "text": "Hello world"}],
    content_type="document",
    embedding_model="text-embedding-3-small",
    embedding_dimensions=512,  # Custom dimensions (only for models that support it)
)
```

## Content Hash

The SDK provides deterministic content hashing for learning tools.

```python
from vector_sdk import compute_content_hash, extract_tool_text

# Compute hash for a FlashCard
hash = compute_content_hash(
    "FlashCard",
    {"type": "BASIC", "term": "Mitochondria", "definition": "The powerhouse of the cell"}
)

# Extract text for embedding
text = extract_tool_text(
    "FlashCard",
    {"type": "BASIC", "term": "Mitochondria", "definition": "The powerhouse of the cell"}
)
```

## Migration from EmbeddingClient

The SDK now uses a namespace-based API with `VectorClient`. The old `EmbeddingClient` is preserved for backward compatibility.

### Method Mapping

| Old (EmbeddingClient) | New (VectorClient) |
|----------------------|-------------------|
| `submit()` | `client.embeddings.create()` |
| `wait_for_result()` | `client.embeddings.wait_for()` |
| `submit_and_wait()` | `client.embeddings.create_and_wait()` |
| `get_queue_depth()` | `client.embeddings.get_queue_depth()` |
| `query()` | `client.search.query()` |
| `wait_for_query_result()` | `client.search.wait_for()` |
| `query_and_wait()` | `client.search.query_and_wait()` |
| `lookup_by_ids()` | `client.db.get_by_ids()` |
| `search_by_metadata()` | `client.db.find_by_metadata()` |
| `clone_from_namespace()` | `client.db.clone()` |
| `delete_from_namespace()` | `client.db.delete()` |

### Migration Example

```python
# Old API (still works, emits deprecation warnings)
from vector_sdk import EmbeddingClient

client = EmbeddingClient("redis://localhost:6379")
result = client.submit_and_wait(texts, content_type)
client.close()

# New API (recommended)
from vector_sdk import VectorClient

client = VectorClient(redis_url="redis://localhost:6379")
result = client.embeddings.create_and_wait(texts, content_type)
client.close()
```

## Error Handling

```python
from vector_sdk import VectorClient, ModelValidationError

try:
    with VectorClient(redis_url="redis://localhost:6379") as client:
        result = client.embeddings.create_and_wait(
            texts=[{"id": "doc1", "text": "Hello"}],
            content_type="test",
            embedding_model="text-embedding-3-small",
            timeout=30,
        )
        
        if result.is_success:
            print("Success!")
        elif result.is_partial:
            print("Partial success. Errors:")
            for err in result.errors:
                print(f"  - {err.id}: {err.error}")

except ModelValidationError as e:
    print(f"Model validation failed: {e}")
except TimeoutError as e:
    print(f"Request timed out: {e}")
except ValueError as e:
    print(f"Invalid input: {e}")
```

## Best Practices

### 1. Use Appropriate Priority

```python
# Use appropriate priority levels
client.embeddings.create(texts, content_type="backfill", priority="low")
client.embeddings.create(texts, content_type="userRequest", priority="critical")
```

### 2. Batch Your Requests

```python
# Batch multiple texts per request for efficiency
texts = [{"id": doc.id, "text": doc.text} for doc in documents]
client.embeddings.create(texts, content_type)
```

### 3. Use Context Managers

```python
with VectorClient(redis_url="redis://...") as client:
    # Client automatically closed on exit
    pass
```

## License

Proprietary - All rights reserved.
