Metadata-Version: 2.4
Name: rehearse
Version: 0.1.2
Summary: Testing framework for AI Voice Agents
Keywords: voice,testing,twilio,voice-agent,pytest,ai
Author: thenullterminator
Author-email: thenullterminator <dazz2803@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Testing
Requires-Dist: fastapi>=0.128.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: requests>=2.32.5
Requires-Dist: uvicorn>=0.40.0
Requires-Dist: websockets>=11
Requires-Dist: twilio>=9.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: elevenlabs>=1.0.0
Requires-Dist: litellm>=1.81.4
Requires-Dist: pytest>=9.0.2
Requires-Dist: pytest-asyncio>=1.3.0
Requires-Python: >=3.11
Project-URL: Homepage, https://github.com/thenullterminator/rehearse
Project-URL: Repository, https://github.com/thenullterminator/rehearse
Description-Content-Type: text/markdown

# rehearse

Testing framework for voice agents. Make testing voice AI as easy as testing web APIs.

## Features

- **Pytest Integration**: Write voice agent tests using familiar pytest patterns
- **Real Phone Calls**: Test your agents via actual Twilio calls
- **LLM-Powered Assertions**: Use semantic assertions to validate agent responses
- **Multi-Provider Support**: ElevenLabs for TTS/STT, LiteLLM for LLM judging (OpenAI, Azure, Anthropic, etc.)
- **Async-First**: Built with async/await for efficient call handling

## Installation

```bash
pip install rehearse
```

Or with uv:

```bash
uv add rehearse
```

## Quick Start

```python
import pytest
from rehearse import TwilioCall, LLMJudge, expect
from rehearse.audio.tts import ElevenLabsTTS
from rehearse.audio.stt import ElevenLabsSTT

# Configure providers
tts = ElevenLabsTTS(api_key="your-elevenlabs-key")
stt = ElevenLabsSTT(api_key="your-elevenlabs-key")
judge = LLMJudge(model="gpt-4o-mini", api_key="your-openai-key")

@pytest.mark.asyncio
async def test_agent_greeting():
    """Test that the agent greets the caller."""
    async with TwilioCall(
        to="+15551234567",           # Agent's phone number
        account_sid="ACxxxxx",        # Twilio Account SID
        auth_token="xxxxx",           # Twilio Auth Token
        from_number="+15559876543",   # Your Twilio number
        ngrok_url="abc123.ngrok.io",  # Your ngrok domain
        tts=tts,
        stt=stt,
    ) as call:
        # Listen for agent's greeting
        response = await call.listen(max_duration=20.0, silence_threshold=5.0)

        # Assert response is not empty
        expect(response).to_not_be_empty()

        # Assert response matches intent using LLM
        await expect(response).to_satisfy("a friendly greeting", llm=judge)
```

## Prerequisites

### 1. Environment Variables

Create a `.env` file with your credentials:

```bash
# Twilio (required)
TWILIO_ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_AUTH_TOKEN=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
TWILIO_FROM_NUMBER=+15559876543

# ElevenLabs (required for TTS/STT)
ELEVENLABS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# LLM Judge - OpenAI
OPENAI_API_KEY=sk-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

# Or Azure OpenAI
AZURE_OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AZURE_BASE_URL=https://your-resource.openai.azure.com

# Your voice agent's phone number
AGENT_PHONE=+15551234567
```

### 2. ngrok Setup

Rehearse needs a public URL to receive Twilio webhooks. Start ngrok before running tests:

```bash
ngrok http 8765
```

Copy the forwarding URL (e.g., `abc123.ngrok-free.app`) and set it as `NGROK_URL` in your environment.

## Running Tests

### Basic Command

```bash
pytest examples/ -v
```

### Recommended Command

For better output during voice agent tests (real-time logs, shorter tracebacks, no warnings):

```bash
pytest examples/ -v -s --tb=short --log-cli-level=INFO --disable-warnings
```

### Command Options Explained

| Option | Description |
|--------|-------------|
| `-v` | Verbose output - shows pass/fail status for each test |
| `-s` | No capture - print statements and logs show in real-time |
| `--tb=short` | Short tracebacks - less noise on failures |
| `--log-cli-level=INFO` | Show INFO level logs as tests run |
| `--disable-warnings` | Suppress deprecation warnings |

### Run a Specific Test

```bash
pytest examples/test_asterisk_agent.py::test_agent_greeting -v -s --tb=short --log-cli-level=INFO --disable-warnings
```

## API Reference

### TwilioCall

The main interface for making test calls.

```python
async with TwilioCall(
    to="+15551234567",           # Phone number to call
    account_sid="ACxxxxx",        # Twilio Account SID
    auth_token="xxxxx",           # Twilio Auth Token
    from_number="+15559876543",   # Your Twilio phone number
    ngrok_url="abc123.ngrok.io",  # ngrok domain for webhooks
    tts=tts,                      # TTS provider instance
    stt=stt,                      # STT provider instance
    send_digits="www7",           # Optional: DTMF digits to send (w = 0.5s wait)
    audio_path="/tmp/debug.wav",  # Optional: Save call audio to WAV file for debugging
) as call:
    # Use call.listen() and call.say()
```

#### Saving Audio for Debugging

Use `audio_path` to save the call's audio to a WAV file for debugging:

```python
async with TwilioCall(
    to="+15551234567",
    audio_path="./recordings/test_greeting.wav",
    # ... other params
) as call:
    response = await call.listen()
    # Audio will be saved to ./recordings/test_greeting.wav when call ends
```

### call.listen()

Listen for the agent's response.

```python
response = await call.listen(
    max_duration=20.0,      # Maximum recording duration in seconds
    silence_threshold=5.0,  # Stop after this many seconds of silence
    timeout=20.0,           # Maximum wait time for response
)
print(response.text)  # Transcribed text
```

### call.say()

Speak to the agent.

```python
await call.say("What are your business hours?")
```

### expect()

Create assertions on responses. All assertions are chainable.

#### Text Assertions

```python
# Check response contains text (case-insensitive)
expect(response).to_contain("hello")

# Check response contains any of the options
expect(response).to_contain_any(["hello", "hi", "hey"])

# Check response matches regex pattern
expect(response).to_match(r"order #\d+")

# Check exact equality
expect(response.text).to_equal("Hello, how can I help you?")

# Check empty/not empty
expect(response).to_not_be_empty()
expect(response).to_be_empty()
```

#### Semantic Assertions (LLM-Powered)

```python
# Single intent check
await expect(response).to_satisfy("a friendly greeting", llm=judge)

# Multiple intents (all must pass)
await expect(response).to_satisfy(
    "acknowledges the customer's request",
    "provides clear next steps",
    "maintains professional tone",
    llm=judge
)

# Synchronous version (for non-async contexts)
expect(response).to_satisfy_sync("a friendly greeting", llm=judge)
```

#### Numeric Assertions

```python
# Check response latency
expect(response.latency).to_be_less_than(2.0)
expect(response.latency).to_be_greater_than(0.5)
```

#### Tool Call Assertions

```python
# Check if agent made a tool call
expect(call.tool_calls).to_contain("transfer", department="sales")

# Check no tool calls were made
expect(call.tool_calls).to_be_empty()
```

### LLMJudge

Configure the LLM for semantic assertions. Powered by [LiteLLM](https://docs.litellm.ai/docs/providers), which means **all major LLM providers are supported** including OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral, Cohere, and more.

```python
# OpenAI
judge = LLMJudge(model="gpt-4o-mini", api_key="sk-xxx")

# Azure OpenAI
judge = LLMJudge(
    model="azure/your-deployment-name",
    api_key="xxx",
    api_base="https://your-resource.openai.azure.com",
)

# Anthropic
judge = LLMJudge(model="claude-3-haiku-20240307", api_key="sk-ant-xxx")

# Google Gemini
judge = LLMJudge(model="gemini/gemini-pro", api_key="xxx")

# AWS Bedrock
judge = LLMJudge(model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0")
```

See [LiteLLM providers](https://docs.litellm.ai/docs/providers) for the full list of supported models.

## Example Test Patterns

### Test Agent Greeting

```python
@pytest.mark.asyncio
async def test_agent_greeting():
    async with TwilioCall(...) as call:
        response = await call.listen()
        expect(response).to_not_be_empty()
        await expect(response).to_satisfy("a friendly greeting", llm=judge)
```

### Test Question and Answer

```python
@pytest.mark.asyncio
async def test_agent_answers_question():
    async with TwilioCall(...) as call:
        # Wait for greeting
        await call.listen()

        # Ask a question
        await call.say("What are your business hours?")

        # Validate response
        response = await call.listen()
        await expect(response).to_satisfy(
            "mentions business hours are Monday, Wednesday, and Friday from 10am to 6pm",
            llm=judge
        )
```

### Test Multi-Turn Conversation

```python
@pytest.mark.asyncio
async def test_multi_turn_conversation():
    async with TwilioCall(...) as call:
        await call.listen()  # Greeting

        await call.say("I need help with my account")
        response1 = await call.listen()

        await call.say("My account number is 12345")
        response2 = await call.listen()

        expect(response2).to_not_be_empty()
```

## Why ngrok?

Twilio uses webhooks to stream audio back to your test runner. Since your local machine isn't publicly accessible, ngrok creates a secure tunnel that exposes your local WebSocket server (port 8765) to the internet. This allows Twilio to send real-time audio data back to Rehearse during the call.

## Logging

Enable logging to see what's happening during tests:

```python
from rehearse import setup_logging

setup_logging("INFO")   # Standard logging
setup_logging("DEBUG")  # Verbose logging
```

## Roadmap

### Connectors
- [x] Twilio
- [ ] Direct WebSocket
- [ ] Vapi
- [ ] Retell
- [ ] Bland.ai

### Audio Providers
- [x] ElevenLabs (TTS/STT)
- [ ] Deepgram
- [ ] AssemblyAI
- [ ] Google Cloud Speech
- [ ] Azure Speech Services

### Audio Simulation
- [ ] Background noise injection (rain, traffic, crowd, office)
- [ ] Different accents and speaking styles
- [ ] Variable audio quality (simulate poor connections)

### Assertions
- [ ] Native audio assertions (volume, silence detection, audio quality)
- [ ] Emotion detection assertions (angry, happy, frustrated)
- [ ] Latency assertions (response time thresholds)
- [ ] Interruption handling assertions

### Advanced Testing
- [ ] Voice agent vs voice agent testing
- [ ] Load testing (concurrent calls)
- [ ] Conversation replay and debugging

## License

MIT
