Metadata-Version: 2.4
Name: roohai
Version: 0.1.2
Summary: Modular real-time voice agent framework with swappable STT, LLM, TTS, and VAD components
Author: Fraser Sequeira
License: Copyright 2024 Fraser Sequeira                                                                                                                                                            
                                                                                                                                                                                                    
          Licensed under the Apache License, Version 2.0 (the "License");                                                                                                                           
          you may not use this file except in compliance with the License.                                                                                                                          
          You may obtain a copy of the License at                                                                                                                                                   
                                                                                                                                                                                                    
              http://www.apache.org/licenses/LICENSE-2.0                                                                                                                                            
                                                                                                                                                                                                    
          Unless required by applicable law or agreed to in writing, software                                                                                                                       
          distributed under the License is distributed on an "AS IS" BASIS,                                                                                                                         
          WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.                                                                                                                  
          See the License for the specific language governing permissions and                                                                                                                       
          limitations under the License. 
Keywords: voice,ai,stt,tts,llm,vad,real-time,agent
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.115.0
Requires-Dist: uvicorn[standard]>=0.32.0
Requires-Dist: transformers<5.0.0,>=4.46.0
Requires-Dist: torch>=2.1.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: boto3>=1.35.0
Requires-Dist: python-multipart>=0.0.12
Requires-Dist: sentencepiece>=0.1.99
Requires-Dist: aiortc>=1.9.0
Requires-Dist: av>=16.1.0
Requires-Dist: aiohttp>=3.9.0
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: silero-vad>=5.1
Requires-Dist: websockets>=12.0
Provides-Extra: deepgram
Requires-Dist: deepgram-sdk>=6.0; extra == "deepgram"
Provides-Extra: cartesia
Requires-Dist: cartesia; extra == "cartesia"
Provides-Extra: piper
Requires-Dist: piper-tts>=1.4; extra == "piper"
Provides-Extra: nvidia
Requires-Dist: nemo_toolkit[asr]; extra == "nvidia"
Provides-Extra: cloud
Requires-Dist: deepgram-sdk>=6.0; extra == "cloud"
Requires-Dist: cartesia; extra == "cloud"
Provides-Extra: all
Requires-Dist: deepgram-sdk>=6.0; extra == "all"
Requires-Dist: cartesia; extra == "all"
Requires-Dist: piper-tts>=1.4; extra == "all"
Dynamic: license-file

# RoohAI Framework

A modular voice AI framework with real-time WebRTC audio, swappable STT/TTS/LLM models, and a browser-based frontend.

*Teri awaaz sun kar, meri rooh ko sukoon milta hai.*

## Quick Start

```bash
# 1. Create a Python 3.11 environment
conda create -n roohai python=3.11 -y
conda activate roohai

# 2. Install RoohAI
pip install "roohai[all]" # Everything (Deepgram, Cartesia, Piper, etc.)

# 3. Start the voice UI
roohai

# Open http://localhost:8000 in your browser
```

Use the web UI to create agents, pick models, and start talking.

## Models

| Type | Name | Backend |
|------|------|---------|
| STT | `whisper-tiny` | HuggingFace `openai/whisper-tiny` — **default** |
| STT | `wav2vec2` | HuggingFace `facebook/wav2vec2-base-960h` |
| STT | `deepgram` | Deepgram Nova (cloud API) |
| STT | `nvidia-canary` | NVIDIA Canary |
| STT | `nvidia-parakeet` | NVIDIA Parakeet |
| TTS | `piper` | Piper TTS (local ONNX) — **default** |
| TTS | `speecht5` | HuggingFace `microsoft/speecht5_tts` |
| TTS | `bark` | HuggingFace `suno/bark-small` |
| TTS | `cartesia` | Cartesia (cloud API) |
| LLM | `bedrock-claude` | Amazon Bedrock (`anthropic.claude-3-haiku`) — **default** |
| LLM | `local` | HuggingFace local model |

Models can be hot-swapped at runtime via the REST API or the frontend UI.

## Environment Variables

- `AWS_ACCESS_KEY_ID` / `AWS_SECRET_ACCESS_KEY` — For Bedrock LLM
- `AWS_DEFAULT_REGION` — AWS region (default: `us-east-1`)
- `DEEPGRAM_API_KEY` — For Deepgram STT
- `CARTESIA_API_KEY` — For Cartesia TTS

Cloud API keys can also be provided through the web UI wizard — they are stored securely in `~/.roohai/secrets.yaml`.

## Adding a Custom Model

1. Create a class extending `STTModel`, `TTSModel`, or `LLMModel` from `roohai.base`
2. Implement the required abstract methods (`load`, `transcribe`/`synthesize`/`chat`, `unload`, `is_loaded`)
3. Register it in `roohai/pipeline.py` `_register_defaults()` and `_CLASS_MAP`
4. Add a config entry in `config.yaml` under `models:`

## API Endpoints

| Method | Path | Description |
|--------|------|-------------|
| GET | `/api/health` | Health check with active model info |
| POST | `/api/transcribe` | Upload audio file, get transcription |
| POST | `/api/chat` | Send text, get LLM response |
| POST | `/api/synthesize` | Send text, get WAV audio |
| POST | `/api/voice-chat` | Full pipeline: audio in -> text + audio out |
| POST | `/api/webrtc/offer` | WebRTC SDP offer/answer exchange |
| GET | `/api/models` | List available and active models |
| POST | `/api/models/swap` | Hot-swap a model at runtime |

## Documentation

For full documentation including architecture, configuration, and advanced usage, see the [RoohAI Guide](http://localhost:8000/guide) (available when the server is running).
