Metadata-Version: 2.4
Name: rehearse
Version: 0.1.7
Summary: Testing framework for AI Voice Agents
Keywords: voice,testing,twilio,voice-agent,pytest,ai
Author: thenullterminator
Author-email: thenullterminator <dazz2803@gmail.com>
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Testing
Requires-Dist: fastapi>=0.128.0
Requires-Dist: python-dotenv>=1.2.1
Requires-Dist: requests>=2.32.5
Requires-Dist: uvicorn>=0.40.0
Requires-Dist: websockets>=11
Requires-Dist: twilio>=9.0.0
Requires-Dist: openai>=1.0.0
Requires-Dist: elevenlabs>=1.0.0
Requires-Dist: litellm>=1.81.4
Requires-Dist: pytest>=9.0.2
Requires-Dist: pytest-asyncio>=1.3.0
Requires-Dist: audioop-lts>=0.2.1 ; python_full_version >= '3.13'
Requires-Python: >=3.11
Project-URL: Homepage, https://github.com/thenullterminator/rehearse
Project-URL: Repository, https://github.com/thenullterminator/rehearse
Description-Content-Type: text/markdown

# rehearse

Testing framework for voice agents. Make testing voice AI as easy as testing web APIs.

## Features

- **Pytest Integration**: Write voice agent tests using familiar pytest patterns
- **Real Phone Calls**: Test your agents via actual Twilio calls
- **LLM-Powered Assertions**: Use semantic assertions to validate agent responses
- **Multi-Provider Support**: ElevenLabs for TTS/STT, LiteLLM for LLM judging (OpenAI, Azure, Anthropic, etc.)
- **Async-First**: Built with async/await for efficient call handling

## Installation

```bash
pip install rehearse
```

Or with uv:

```bash
uv add rehearse
```

## Quick Start

```python
import pytest
from rehearse import TwilioCall, LLMJudge, expect
from rehearse.audio.tts import ElevenLabsTTS
from rehearse.audio.stt import ElevenLabsSTT

# Configure providers
tts = ElevenLabsTTS(api_key="your-elevenlabs-key")
stt = ElevenLabsSTT(api_key="your-elevenlabs-key")
judge = LLMJudge(model="gpt-4o-mini", api_key="your-openai-key")

@pytest.mark.asyncio
async def test_restaurant_reservation():
    """Test booking a table through a voice agent."""
    async with TwilioCall(
        to="+15551234567",           # Restaurant agent's number
        account_sid="ACxxxxx",
        auth_token="xxxxx",
        from_number="+15559876543",
        ngrok_url="abc123.ngrok.io",
        tts=tts,
        stt=stt,
    ) as call:
        # Agent greets the caller
        greeting = await call.listen()
        expect(greeting).to_not_be_empty()
        await expect(greeting).to_satisfy("a polite greeting", llm=judge)

        # Try to book at invalid time (midnight)
        await call.say("I'd like to book a table for 2 at midnight")
        response = await call.listen()

        # Agent should refuse and mention valid hours
        await expect(response).to_satisfy(
            "politely declines the midnight booking",
            "mentions operating hours are 11am to 10pm, closed on Mondays",
            llm=judge
        )

        # Book at a valid time
        await call.say("Okay, how about 7pm tomorrow?")
        response = await call.listen()

        await expect(response).to_satisfy(
            "confirms the day and time as 7pm",
            "asks for name or party size",
            llm=judge
        )

        # Provide name for reservation
        await call.say("It's for 2 people, name is John Smith")
        confirmation = await call.listen()

        # Agent confirms the booking
        expect(confirmation).to_contain("smith")
        await expect(confirmation).to_satisfy("confirms the reservation is complete", llm=judge)
```

## Prerequisites

### ngrok Setup

`TwilioCall` requires ngrok to receive Twilio webhooks. Start ngrok before running tests:

```bash
ngrok http 8765
```

Copy the forwarding URL (e.g., `abc123.ngrok-free.app`) and pass it as `ngrok_url` to `TwilioCall`.

## Running Tests

### Basic Command

```bash
pytest tests/ -v
```

### Recommended Command

For better output during voice agent tests (real-time logs, shorter tracebacks, no warnings):

```bash
pytest tests/ -v -s --tb=short --log-cli-level=INFO --disable-warnings
```

### Command Options Explained

| Option | Description |
|--------|-------------|
| `-v` | Verbose output - shows pass/fail status for each test |
| `-s` | No capture - print statements and logs show in real-time |
| `--tb=short` | Short tracebacks - less noise on failures |
| `--log-cli-level=INFO` | Show INFO level logs as tests run |
| `--disable-warnings` | Suppress deprecation warnings |

### Run a Specific Test

```bash
pytest tests/test_asterisk_agent.py::test_agent_greeting -v -s --tb=short --log-cli-level=INFO --disable-warnings
```

## API Reference

### TwilioCall

The main interface for making test calls.

```python
async with TwilioCall(
    to="+15551234567",           # Phone number to call
    account_sid="ACxxxxx",        # Twilio Account SID
    auth_token="xxxxx",           # Twilio Auth Token
    from_number="+15559876543",   # Your Twilio phone number
    ngrok_url="abc123.ngrok.io",  # ngrok domain for webhooks
    tts=tts,                      # TTS provider instance
    stt=stt,                      # STT provider instance
    send_digits="www7",           # Optional: DTMF digits to send (w = 0.5s wait)
    audio_path="/tmp/debug.wav",  # Optional: Save call audio to WAV file for debugging
) as call:
    # Use call.listen() and call.say()
```

#### Saving Audio for Debugging

Use `audio_path` to save the call's audio to a WAV file for debugging:

```python
async with TwilioCall(
    to="+15551234567",
    audio_path="./recordings/test_greeting.wav",
    # ... other params
) as call:
    response = await call.listen()
    # Audio will be saved to ./recordings/test_greeting.wav when call ends
```

### call.listen()

Listen for the agent's response.

```python
response = await call.listen(
    max_duration=20.0,      # Maximum recording duration in seconds
    silence_threshold=5.0,  # Stop after this many seconds of silence
    timeout=20.0,           # Maximum wait time for response
)
print(response.text)  # Transcribed text
```

### call.say()

Speak to the agent.

```python
await call.say("What are your business hours?")
```

### expect()

Create assertions on responses. All assertions are chainable.

#### Text Assertions

```python
# Check response contains text (case-insensitive)
expect(response).to_contain("hello")

# Check response contains any of the options
expect(response).to_contain_any(["hello", "hi", "hey"])

# Check response matches regex pattern
expect(response).to_match(r"order #\d+")

# Check exact equality
expect(response.text).to_equal("Hello, how can I help you?")

# Check empty/not empty
expect(response).to_not_be_empty()
expect(response).to_be_empty()
```

#### Semantic Assertions (LLM-Powered)

```python
# Single intent check
await expect(response).to_satisfy("a friendly greeting", llm=judge)

# Multiple intents (all must pass)
await expect(response).to_satisfy(
    "acknowledges the customer's request",
    "provides clear next steps",
    "maintains professional tone",
    llm=judge
)

# Synchronous version (for non-async contexts)
expect(response).to_satisfy_sync("a friendly greeting", llm=judge)
```

#### Numeric Assertions

```python
# Check response latency
expect(response.latency).to_be_less_than(2.0)
expect(response.latency).to_be_greater_than(0.5)
```

#### Tool Call Assertions

```python
# Check if agent made a tool call
expect(call.tool_calls).to_contain("transfer", department="sales")

# Check no tool calls were made
expect(call.tool_calls).to_be_empty()
```

### LLMJudge

Configure the LLM for semantic assertions. Powered by [LiteLLM](https://docs.litellm.ai/docs/providers), which means **all major LLM providers are supported** including OpenAI, Azure OpenAI, Anthropic, Google Gemini, AWS Bedrock, Mistral, Cohere, and more.

```python
# OpenAI
judge = LLMJudge(model="gpt-4o-mini", api_key="sk-xxx")

# Azure OpenAI
judge = LLMJudge(
    model="azure/your-deployment-name",
    api_key="xxx",
    api_base="https://your-resource.openai.azure.com",
)

# Anthropic
judge = LLMJudge(model="claude-3-haiku-20240307", api_key="sk-ant-xxx")

# Google Gemini
judge = LLMJudge(model="gemini/gemini-pro", api_key="xxx")

# AWS Bedrock
judge = LLMJudge(model="bedrock/anthropic.claude-3-sonnet-20240229-v1:0")
```

See [LiteLLM providers](https://docs.litellm.ai/docs/providers) for the full list of supported models.

## Example Test Patterns

### Test Agent Greeting

```python
@pytest.mark.asyncio
async def test_agent_greeting():
    async with TwilioCall(...) as call:
        response = await call.listen()
        expect(response).to_not_be_empty()
        await expect(response).to_satisfy("a friendly greeting", llm=judge)
```

### Test Question and Answer

```python
@pytest.mark.asyncio
async def test_agent_answers_question():
    async with TwilioCall(...) as call:
        # Wait for greeting
        await call.listen()

        # Ask a question
        await call.say("What are your business hours?")

        # Validate response
        response = await call.listen()
        await expect(response).to_satisfy(
            "mentions business hours are Monday, Wednesday, and Friday from 10am to 6pm",
            llm=judge
        )
```

### Test Multi-Turn Conversation

```python
@pytest.mark.asyncio
async def test_multi_turn_conversation():
    async with TwilioCall(...) as call:
        await call.listen()  # Greeting

        await call.say("I need help with my account")
        response1 = await call.listen()

        await call.say("My account number is 12345")
        response2 = await call.listen()

        expect(response2).to_not_be_empty()
```

## Logging

Enable logging to see what's happening during tests:

```python
from rehearse import setup_logging

setup_logging("INFO")   # Standard logging
setup_logging("DEBUG")  # Verbose logging
```

## Roadmap

### Connectors
- [x] Twilio
- [ ] Direct WebSocket
- [ ] Vapi
- [ ] Retell
- [ ] Bland.ai

### Audio Providers
- [x] ElevenLabs (TTS/STT)
- [ ] Deepgram
- [ ] AssemblyAI
- [ ] Google Cloud Speech
- [ ] Azure Speech Services

### Audio Simulation
- [ ] Background noise injection (rain, traffic, crowd, office)
- [ ] Different accents and speaking styles
- [ ] Variable audio quality (simulate poor connections)

### Assertions
- [ ] Native audio assertions (volume, silence detection, audio quality)
- [ ] Emotion detection assertions (angry, happy, frustrated)
- [ ] Latency assertions (response time thresholds)
- [ ] Interruption handling assertions

### Advanced Testing
- [ ] Voice agent vs voice agent testing
- [ ] Load testing (concurrent calls)
- [ ] Conversation replay and debugging

## Vision: Full-Featured Test

> **Note:** This example demonstrates planned capabilities that are not yet implemented. See the Roadmap above for current implementation status.

```python
@pytest.mark.asyncio
async def test_support_call_with_realistic_conditions():
    """Test support agent handles a frustrated customer in noisy environment."""
    async with VapiCall(
        assistant_id="your-assistant-id",  # Vapi assistant to test
        api_key="your-vapi-key",
        background_noise=BackgroundNoise.COFFEE_SHOP, # caller is in a busy coffee shop
        noise_level=0.3,  # 30% background noise
        audio_quality=AudioQuality.POOR_CONNECTION, # Simulate poor cell connection
        speaking_style=SpeakingStyle( # accent and style
            accent="british",
            speed=1.3,  # 30% faster than normal
        ),
    ) as call:
        # Agent greets the caller
        greeting = await call.listen()

        # Check response latency is acceptable
        expect(greeting.latency).to_be_less_than(2.0)

        # Check audio quality metrics
        expect(greeting.audio).to_have_volume_above(0.3)
        expect(greeting.audio).to_have_no_clipping()

        # Express frustration about a billing issue
        await call.say(
            "I've been charged twice for my subscription and nobody is helping me!",
            emotion="frustrated",  # TTS renders with frustrated tone
        )
        response = await call.listen()

        # Agent should detect frustration and respond empathetically
        await expect(response).to_satisfy(
            "acknowledges the customer's frustration",
            "apologizes for the inconvenience",
            "does not sound robotic or dismissive",
            llm=judge
        )

        # Check agent's emotional tone
        expect(response.audio).to_have_emotion("empathetic")
        expect(response.audio).to_not_have_emotion("dismissive")

        # Interrupt the agent mid-sentence to test barge-in handling
        await call.say("Just fix it!", interrupt=True)
        response = await call.listen()

        # Agent should handle interruption gracefully
        expect(response).to_handle_interruption_gracefully()
        await expect(response).to_satisfy(
            "does not restart from the beginning",
            "acknowledges the urgency",
            llm=judge
        )

        # Provide account details
        await call.say("My account number is 1-2-3-4-5-6")
        response = await call.listen()

        # Check agent made the right tool call
        expect(call.tool_calls).to_contain(
            "lookup_account",
            account_number="123456"
        )

        # Final resolution
        await call.say("Yes, please process the refund")
        confirmation = await call.listen()

        await expect(confirmation).to_satisfy(
            "confirms refund will be processed",
            "provides timeline or reference number",
            "asks if there's anything else",
            llm=judge
        )


@pytest.mark.asyncio
async def test_agent_handles_profanity():
    """Test agent remains professional when user uses inappropriate language."""
    async with BlandCall(
        phone_number="+15551234567",
        api_key="your-bland-key",
        background_noise=BackgroundNoise.TRAFFIC,
        noise_level=0.4,
        speaking_style=SpeakingStyle(
            accent="american",
            speed=1.4,  # Speaking fast when angry
        ),
    ) as call:
        await call.listen()  # Greeting

        await call.say(
            "This is bullshit, I want to speak to a manager!",
            emotion="angry",
        )
        response = await call.listen()

        # Agent should remain professional and de-escalate
        await expect(response).to_satisfy(
            "remains calm and professional",
            "does not mirror the profanity",
            "offers to escalate or resolve the issue",
            llm=judge
        )
        expect(response.audio).to_not_have_emotion("angry")
        expect(response.latency).to_be_less_than(2.5)


@pytest.mark.asyncio
async def test_agent_handles_topic_deviation():
    """Test agent redirects when user goes off-topic."""
    async with WebSocketCall(
        url="wss://your-agent.example.com/ws",
        background_noise=BackgroundNoise.TV,
        noise_level=0.25,
        audio_quality=AudioQuality.GOOD,
        speaking_style=SpeakingStyle(
            accent="australian",
            speed=0.9,
        ),
    ) as call:
        await call.listen()  # Greeting

        await call.say("I need help with my order")
        await call.listen()

        # User suddenly goes off-topic
        await call.say("By the way, what's the weather like in New York?")
        response = await call.listen()

        # Agent should gracefully redirect
        await expect(response).to_satisfy(
            "politely declines or briefly acknowledges the off-topic question",
            "redirects conversation back to the original issue",
            llm=judge
        )


@pytest.mark.asyncio
async def test_agent_handles_ambiguous_input():
    """Test agent asks for clarification on vague responses."""
    async with WebSocketCall(
        url="wss://your-agent.example.com/ws",
        background_noise=BackgroundNoise.QUIET_ROOM,
        noise_level=0.05,
        audio_quality=AudioQuality.EXCELLENT,
        speaking_style=SpeakingStyle(
            accent="indian",
            speed=0.7,  # Speaking slowly, uncertain
        ),
    ) as call:
        await call.listen()  # Greeting

        await call.say("I have a problem", emotion="uncertain")
        await call.listen()

        # User gives ambiguous response
        await call.say("Hmm, I don't know, maybe, I guess?", emotion="hesitant")
        response = await call.listen()

        # Agent should ask for clarification
        await expect(response).to_satisfy(
            "asks a clarifying question",
            "does not make assumptions about user intent",
            llm=judge
        )
        expect(response.audio).to_have_emotion("patient")
```

## License

MIT
