Metadata-Version: 2.4
Name: roohai
Version: 0.1.8
Summary: Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components
Author: Fraser Sequeira
License: Copyright 2026 RoohAI                                                                                                                                                     
                                                                                                                                                                                                    
          Licensed under the Apache License, Version 2.0 (the "License");                                                                                                                           
          you may not use this file except in compliance with the License.                                                                                                                          
          You may obtain a copy of the License at                                                                                                                                                   
                                                                                                                                                                                                    
              http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                            
                                                                                                                                                                                                    
          Unless required by applicable law or agreed to in writing, software                                                                                                                       
          distributed under the License is distributed on an "AS IS" BASIS,                                                                                                                         
          WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                                                                                                                  
          See the License for the specific language governing permissions and                                                                                                                       
          limitations under the License. 
        
Project-URL: Changelog, https://github.com/roohai/roohai-framework/blob/main/CHANGELOG.md
Project-URL: Documentation, https://github.com/roohai/roohai-framework#readme
Project-URL: Repository, https://github.com/roohai/roohai-framework
Keywords: voice,ai,stt,tts,llm,vad,real-time,agent
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.115.0
Requires-Dist: uvicorn[standard]>=0.32.0
Requires-Dist: transformers<5.0.0,>=4.46.0
Requires-Dist: accelerate>=0.26.0
Requires-Dist: torch>=2.1.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: boto3>=1.35.0
Requires-Dist: python-multipart>=0.0.12
Requires-Dist: sentencepiece>=0.1.99
Requires-Dist: aiortc>=1.9.0
Requires-Dist: av>=16.1.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: silero-vad>=5.1
Requires-Dist: websockets>=12.0
Requires-Dist: strands-agents>=1.0.0
Requires-Dist: strands-agents-tools>=0.1.0
Requires-Dist: openai>=1.0.0
Requires-Dist: google-genai>=1.0.0
Requires-Dist: ollama>=0.4.0
Requires-Dist: anthropic>=0.40.0
Requires-Dist: deepgram-sdk>=6.0
Requires-Dist: cartesia
Requires-Dist: piper-tts>=1.4
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.20.0
Requires-Dist: pyjwt[crypto]>=2.8.0
Provides-Extra: nvidia
Requires-Dist: nemo_toolkit[asr]; extra == "nvidia"
Dynamic: license-file

<p align="center">
  <h1 align="center">RoohAI</h1>
  <p align="center">
    Open-source voice AI framework for building real-time voice agents.<br/>
    Swap STT, TTS, and LLM models with a single line of config.
  </p>
</p>

<p align="center">
  <a href="https://pypi.org/project/roohai/"><img src="https://img.shields.io/pypi/v/roohai?color=blue" alt="PyPI"></a>
  <a href="https://pypi.org/project/roohai/"><img src="https://img.shields.io/pypi/pyversions/roohai" alt="Python"></a>
  <a href="https://github.com/Fraser27/roohai-framework/blob/main/LICENSE"><img src="https://img.shields.io/github/license/Fraser27/roohai-framework" alt="License"></a>
</p>

<p align="center"><em>Teri awaaz sun kar, meri rooh ko sukoon milta hai.</em></p>
<p align="center"><a href="https://frasersequeira.medium.com/building-voice-agents-with-rooh-b4ece2abbb14"><h3>Why Rooh</h3></a></p>



## Features

- **Real-time voice** — WebRTC and WebSocket transports with sub-second latency
- **Swappable models** — Mix and match STT, TTS, LLM, and VAD providers via YAML config
- **Hot-swap at runtime** — Change models without restarting the server
- **Agent wizard UI** — Browser-based GUI to create agents, pick models, and start talking
- **LLM streaming** — Token-by-token responses with sentence-boundary TTS overlap
- **Barge-in** — Interrupt the AI mid-sentence by speaking
- **Hooks & extensibility** — Plug in custom LLM logic, tool use (Strands SDK), and observability
- **Built-in frontend** — Dark-themed vanilla HTML/CSS/JS UI, no build step required

## Quick Start

```bash
pip install roohai
```

Then start the server:

```bash
roohai
# Open http://localhost:8000
```

Use the web UI to create an agent, select your models, and start a conversation.

All STT, TTS, and LLM providers are included by default. For NVIDIA models, install the extra:

```bash
pip install "roohai[nvidia]"
```

## Supported Models

### Speech-to-Text

| Name | Provider | Notes |
|------|----------|-------|
| `deepgram` | Deepgram Nova | Cloud API, streaming support |
| `nvidia-parakeet` | NVIDIA | Local, high accuracy |
| `whisper-tiny` | HuggingFace | Local, fast, English-focused. **Default** |
| `whisper-base` | HuggingFace | Better accuracy, still lightweight |
| `whisper-small` | HuggingFace | Best local accuracy |


### Text-to-Speech

| Name | Provider | Notes |
|------|----------|-------|
| `cartesia` | Cartesia Sonic | Cloud API, natural voices |
| `deepgram` | Deepgram Aura | Cloud API, natural voices  |
| `piper` | Piper TTS | Local ONNX, multiple voices. **Default** |
| `speecht5` | HuggingFace | Local, lightweight |
| `bark` | HuggingFace | Local, expressive |


### LLM

| Name | Provider | Notes |
|------|----------|-------|
| `bedrock` | AWS Bedrock | Claude Haiku/Sonnet/Opus. **Default** |
| `openai` | OpenAI | GPT-4o, GPT-4o-mini via Strands SDK |
| `anthropic` | Anthropic | Claude models via Strands SDK |
| `gemini` | Google | Gemini Flash/Pro via Strands SDK |
| `ollama` | Ollama | Any local model (Llama 3, Mistral, etc.) |
| `local` | HuggingFace | Any local causal LM (direct, no Strands) |

### VAD

| Name | Provider |
|------|----------|
| `silero` | Silero VAD |

## Configuration

### Environment Variables

| Variable | Required for |
|----------|-------------|
| `BEDROCK_API_KEY` or `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` | Bedrock LLM |
| `AWS_DEFAULT_REGION` | Bedrock (default: `us-east-1`) |
| `OPENAI_API_KEY` | OpenAI LLM |
| `ANTHROPIC_API_KEY` | Anthropic LLM |
| `GOOGLE_API_KEY` | Gemini LLM |
| `DEEPGRAM_API_KEY` | Deepgram STT/TTS |
| `CARTESIA_API_KEY` | Cartesia TTS |

API keys can also be set through the agent wizard UI — they're stored in `~/.roohai/secrets.yaml` with `0600` permissions.

### Agent Config

Agents are defined as YAML files in `~/.roohai/agents/`. Each agent specifies its models, system prompt, and transport:

```yaml
name: my-agent
system_prompt: "You are a helpful voice assistant."
llm_streaming: true
pipeline:
  stt: whisper-base
  tts: piper
  llm: bedrock-claude
  vad: silero
transport: websocket
```

## Usage

### CLI

```bash
roohai                           # Start with defaults
roohai --port 3000               # Custom port
roohai --reload                  # Auto-reload for development
roohai --log-level debug         # Verbose logging
```

### Python API

#### Builder Pattern

```python
from roohai import Rooh

pipeline = (
    Rooh.builder()
    .stt("whisper-tiny")
    .tts("piper")
    .llm("bedrock", model_id="us.anthropic.claude-sonnet-4-20250514-v1:0")
    .vad("silero")
    .system_prompt("You are a helpful assistant.")
    .build()
)
pipeline.load()

# Transcribe
text = await pipeline.transcribe(audio_bytes)

# Chat
response = await pipeline.chat("Hello, how are you?")

# Stream
async for chunk in pipeline.chat_stream("Tell me a story"):
    print(chunk, end="")

# Full pipeline: audio in -> text + audio out
transcription, response, audio = await pipeline.process_audio(audio_bytes)
```

#### From Config

```python
from roohai import Rooh

pipeline = Rooh.from_config({
    "pipeline": {"stt": "whisper-tiny", "tts": "piper", "llm": "bedrock-claude", "vad": "silero"},
    "system_prompt": "You are a helpful assistant.",
})
pipeline.load()
```

### REST API

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/health` | Health check with active model info |
| `POST` | `/api/transcribe` | Audio file -> transcription |
| `POST` | `/api/chat` | Text -> LLM response |
| `POST` | `/api/synthesize` | Text -> WAV audio |
| `POST` | `/api/voice-chat` | Audio in -> text + audio out |
| `POST` | `/api/webrtc/offer` | WebRTC SDP offer/answer |
| `GET` | `/api/models` | List available and active models |
| `POST` | `/api/models/swap` | Hot-swap a model at runtime |

## Examples

The [`examples/`](examples/) directory contains complete working apps:

- **[quickstart](examples/quickstart/)** — Minimal voice agent
- **[barge-in-hook](examples/barge-in-hook/)** — Custom barge-in handling
- **[session-memory-agent](examples/session-memory-agent/)** — Per-session conversation memory with Strands SDK
- **[voice-weather-agent](examples/voice-weather-agent/)** — Voice agent with tool use (weather API)
- **[skill-interview-agent](examples/skill-interview-agent/)** — Structured interview agent

## Extending RoohAI

### Custom Models

Create a class extending `STTModel`, `TTSModel`, or `LLMModel`:

```python
from roohai import STTModel, registry

class MySTT(STTModel):
    def load(self): ...
    def unload(self): ...
    @property
    def is_loaded(self) -> bool: ...
    def transcribe(self, audio, sample_rate) -> str: ...

registry.register_stt("my-stt", MySTT)
```

### LLM Hooks

Override LLM behavior with hooks for tool use, RAG, or custom logic:

```python
pipeline.set_llm_hooks(
    hook=my_batch_handler,
    stream_hook=my_streaming_handler,
)
```

See the [Strands SDK integration](examples/session-memory-agent/) for a full example with tool use and conversation memory.

## Documentation

Full docs are available at `http://localhost:8000/guide` when the server is running, including architecture details, advanced configuration, and the complete model catalog.

## Contributing

Contributions are welcome. Please open an issue first to discuss what you'd like to change.

```bash
git clone https://github.com/Fraser27/roohai-framework.git
cd roohai-framework
pip install -e ".[all]"
pytest
```

## License

Apache 2.0 — see [LICENSE](LICENSE).
