Metadata-Version: 2.3
Name: voice-agents
Version: 0.1.4
Summary: A comprehensive Python library for building production-ready voice agents with multi-provider support. Features real-time streaming TTS/STT, OpenAI, ElevenLabs, and Groq integration, audio processing, and seamless conversational AI capabilities.
License: MIT
Keywords: voice agents,voice ai,text-to-speech,speech-to-text,tts,stt,voice assistants,conversational ai,voice interfaces,audio processing,real-time streaming,streaming audio,openai,openai tts,openai api,groq,groq voice,elevenlabs,elevenlabs api,voice synthesis,speech synthesis,voice recognition,speech recognition,audio streaming,streaming tts,streaming stt,ai agents,artificial intelligence,llms,large language models,voice applications,voice bots,audio generation,multimodal ai,voice interaction,audio ai,production voice agents,enterprise voice agents
Author: Kye Gomez
Author-email: kye@swarms.world
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: httpx
Requires-Dist: numpy
Requires-Dist: python-dotenv
Requires-Dist: setuptools
Requires-Dist: sounddevice
Requires-Dist: soundfile
Project-URL: Documentation, https://github.com/The-Swarm-Corporation/Voice-Agents
Project-URL: Homepage, https://github.com/The-Swarm-Corporation/Voice-Agents
Project-URL: Repository, https://github.com/The-Swarm-Corporation/Voice-Agents
Description-Content-Type: text/markdown

# Voice-Agents

<div align="center">

**Enterprise-Grade Voice Agent Infrastructure for Production-Ready AI Applications**

[![License](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
[![Python](https://img.shields.io/badge/Python-3.10%2B-blue.svg)](https://www.python.org/)
[![Made by Swarms](https://img.shields.io/badge/Made%20by-Swarms%20Corporation-black)](https://swarms.ai)

*Multi-provider voice synthesis and recognition with real-time streaming capabilities*

[Features](#-features) • [Installation](#-installation) • [Quick Start](#-quick-start) • [Documentation](#-documentation) • [Examples](#-examples)

</div>

---

## 🌟 Overview

**Voice-Agents** is a production-ready Python library for building enterprise-grade voice-enabled AI applications. Built by [Swarms Corporation](https://swarms.ai), it provides seamless integration with multiple TTS/STT providers including OpenAI, ElevenLabs, and Groq, with real-time streaming capabilities optimized for agent-based architectures.

Whether you're building conversational AI assistants, voice-enabled agents, or real-time audio processing systems, Voice-Agents provides the infrastructure you need to go from prototype to production.

### 🏢 Built by Swarms Corporation

Voice-Agents is part of the [Swarms](https://github.com/kyegomez/swarms) ecosystem—the enterprise-grade, production-ready multi-agent orchestration framework. Learn more at [swarms.ai](https://swarms.ai) and [docs.swarms.world](https://docs.swarms.world).

---

## ✨ Features

### 🎯 Core Capabilities

- **Multi-Provider TTS Support**: Seamlessly switch between OpenAI, ElevenLabs, and Groq
- **Real-Time Streaming**: Low-latency audio streaming for live agent interactions
- **Speech-to-Text**: High-accuracy transcription using OpenAI Whisper
- **Audio Processing**: Built-in utilities for recording, playback, and format conversion
- **Production-Ready**: Enterprise-grade error handling, authentication, and logging

### 🚀 Advanced Features

- **Streaming Callbacks**: Real-time TTS callbacks for agent streaming outputs
- **Multiple Audio Formats**: Support for PCM, MP3, Opus, AAC, FLAC, and more
- **Voice Customization**: 10+ OpenAI voices and 30+ ElevenLabs voices
- **Sentence Detection**: Intelligent text formatting for natural speech pauses
- **FastAPI Integration**: Generator-based streaming for web applications
- **Type Safety**: Full type hints and Literal types for better IDE support

---

## 📦 Installation

### Basic Installation

```bash
pip install voice-agents
```

### Development Installation

```bash
git clone https://github.com/The-Swarm-Corporation/Voice-Agents.git
cd Voice-Agents
pip install -e .
```

### Requirements

- Python 3.10+
- API keys for your chosen providers:
  - OpenAI API key (for TTS and Whisper STT)
  - ElevenLabs API key (optional, for ElevenLabs TTS)

---

## 🚀 Quick Start

### Environment Setup

Create a `.env` file or set environment variables:

```bash
export OPENAI_API_KEY="your-openai-api-key"
export ELEVENLABS_API_KEY="your-elevenlabs-api-key"  # Optional
```

### Basic Text-to-Speech

```python
from voice_agents import stream_tts, format_text_for_speech

# Format text for natural speech
text = "Hello! This is a voice agent speaking. How can I help you today?"
chunks = format_text_for_speech(text)

# Convert to speech and play
stream_tts(chunks, voice="alloy", model="tts-1")
```

### Speech-to-Text

```python
from voice_agents import speech_to_text, record_audio

# Record audio from microphone
audio = record_audio(duration=5.0, sample_rate=16000)

# Transcribe to text
transcription = speech_to_text(audio_data=audio, sample_rate=16000)
print(f"Transcribed: {transcription}")
```

---

## 📚 Core Functions

### Text-to-Speech (OpenAI)

```python
from voice_agents import stream_tts, format_text_for_speech, VOICES

# Available voices: alloy, ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
text_chunks = format_text_for_speech("Your text here")

# Basic usage - plays audio
stream_tts(text_chunks, voice="nova", model="tts-1")

# Streaming mode for real-time processing
stream_tts(
    text_chunks,
    voice="alloy",
    model="tts-1",
    stream_mode=True,  # Process chunks as they arrive
    response_format="pcm"
)

# For FastAPI/web streaming
from fastapi.responses import StreamingResponse

def audio_endpoint():
    generator = stream_tts(
        text_chunks,
        voice="alloy",
        return_generator=True
    )
    return StreamingResponse(generator, media_type="audio/pcm")
```

### Text-to-Speech (ElevenLabs)

```python
from voice_agents import stream_tts_elevenlabs, ELEVENLABS_VOICE_NAMES

# Available voices: rachel, domi, bella, antoni, elli, josh, and 25+ more
print(f"Available voices: {ELEVENLABS_VOICE_NAMES}")

# Basic usage
stream_tts_elevenlabs(
    text_chunks,
    voice_id="rachel",  # Use friendly name or voice ID
    model_id="eleven_multilingual_v2",
    stability=0.5,
    similarity_boost=0.75
)

# High-quality streaming for web
generator = stream_tts_elevenlabs(
    text_chunks,
    voice_id="domi",
    output_format="mp3_44100_128",  # Recommended for web
    return_generator=True
)
```

### Speech-to-Text

```python
from voice_agents import speech_to_text, record_audio
import numpy as np

# From audio file
transcription = speech_to_text(
    audio_file_path="recording.wav",
    model="whisper-1",
    language="en",  # Optional: auto-detect if None
    response_format="text"
)

# From numpy array (recorded audio)
audio = record_audio(duration=5.0, sample_rate=16000)
transcription = speech_to_text(
    audio_data=audio,
    sample_rate=16000,
    prompt="This is a technical conversation about AI"  # Optional context
)

# Get structured output
result = speech_to_text(
    audio_file_path="meeting.mp3",
    response_format="verbose_json"  # Returns detailed metadata
)
```

### Audio Recording

```python
from voice_agents import record_audio

# Record 5 seconds of audio
audio = record_audio(duration=5.0, sample_rate=16000, channels=1)

# Use with speech-to-text
from voice_agents import speech_to_text
text = speech_to_text(audio_data=audio, sample_rate=16000)
```

### Streaming TTS Callback for Agents

```python
from voice_agents import StreamingTTSCallback

# Create callback for real-time agent responses
tts_callback = StreamingTTSCallback(
    voice="alloy",
    model="tts-1",
    min_sentence_length=10  # Minimum chars before speaking
)

# Use with any streaming text generator
def agent_stream():
    for chunk in some_agent.generate():
        tts_callback(chunk)  # Automatically speaks complete sentences
    tts_callback.flush()  # Speak any remaining text
```

### Audio Format Utilities

```python
from voice_agents import get_media_type_for_format

# Get MIME type for FastAPI
media_type = get_media_type_for_format("mp3_44100_128")
# Returns: "audio/mpeg"

media_type = get_media_type_for_format("pcm_44100")
# Returns: "audio/pcm"
```

---

## 🤖 Swarms Integration

Voice-Agents is designed to work seamlessly with [Swarms](https://github.com/kyegomez/swarms), the enterprise-grade multi-agent orchestration framework.

### Complete Example: Voice-Enabled Trading Agent

```python
from swarms import Agent
from voice_agents import StreamingTTSCallback, format_text_for_speech

# Initialize the Swarms agent
agent = Agent(
    agent_name="Quantitative-Trading-Agent",
    agent_description="Advanced quantitative trading and algorithmic analysis agent",
    model_name="gpt-4",
    dynamic_temperature_enabled=True,
    max_loops=1,
    dynamic_context_window=True,
    top_p=None,
    streaming_on=True,
    interactive=False,
)

# Create the streaming TTS callback
tts_callback = StreamingTTSCallback(voice="alloy", model="tts-1")

# Run the agent with streaming TTS callback
out = agent.run(
    task="What are the top five best energy stocks across nuclear, solar, gas, and other energy sources?",
    streaming_callback=tts_callback,
)

# Flush any remaining text in the buffer
tts_callback.flush()

print(out)
```

### Voice-Enabled Multi-Agent System

```python
from swarms import Agent, Swarm
from voice_agents import StreamingTTSCallback

# Create multiple voice-enabled agents
research_agent = Agent(
    agent_name="Research-Agent",
    model_name="gpt-4",
    streaming_on=True,
)

analysis_agent = Agent(
    agent_name="Analysis-Agent",
    model_name="gpt-4",
    streaming_on=True,
)

# Create voice callbacks for each agent
research_tts = StreamingTTSCallback(voice="nova", model="tts-1")
analysis_tts = StreamingTTSCallback(voice="alloy", model="tts-1")

# Create swarm
swarm = Swarm(
    agents=[research_agent, analysis_agent],
    # ... other swarm configuration
)

# Run with voice callbacks
results = swarm.run(
    task="Analyze market trends",
    callbacks=[research_tts, analysis_tts]
)
```

---

## 📖 API Reference

### Constants

- `SAMPLE_RATE`: Default sample rate (24000 Hz)
- `VOICES`: List of available OpenAI voices
- `ELEVENLABS_VOICES`: Dictionary mapping friendly names to ElevenLabs voice IDs
- `ELEVENLABS_VOICE_NAMES`: List of available ElevenLabs voice names
- `VoiceType`: Type alias for OpenAI voice options

### Functions

#### `format_text_for_speech(text: str) -> List[str]`
Intelligently formats text into speech-friendly chunks by detecting sentence boundaries, handling abbreviations, and preserving natural pauses.

#### `stream_tts(text_chunks, voice, model, stream_mode, response_format, return_generator)`
OpenAI TTS with streaming support. Returns generator for web streaming or plays audio directly.

#### `stream_tts_elevenlabs(text_chunks, voice_id, model_id, stability, similarity_boost, output_format, return_generator)`
ElevenLabs TTS with advanced voice control and multiple output formats.

#### `speech_to_text(audio_file_path, audio_data, sample_rate, model, language, prompt, response_format)`
OpenAI Whisper transcription with support for files or numpy arrays.

#### `record_audio(duration, sample_rate, channels) -> np.ndarray`
Record audio from default microphone. Returns numpy array.

#### `play_audio(audio_data: np.ndarray)`
Play audio data using sounddevice.

#### `get_media_type_for_format(output_format: str) -> str`
Get MIME type for audio format (useful for FastAPI).

### Classes

#### `StreamingTTSCallback`
Real-time TTS callback for agent streaming outputs. Automatically detects complete sentences and converts them to speech.

**Methods:**
- `__call__(chunk: str)`: Process streaming text chunk
- `flush()`: Speak any remaining buffered text

---

## 🎨 Use Cases

### Conversational AI Assistants
Build voice-enabled chatbots and virtual assistants with natural, real-time speech synthesis.

### Agent Narration
Provide audio feedback for long-running agent tasks, making agent behavior transparent and engaging.

### Voice-Enabled Analytics
Create voice interfaces for data analysis, trading systems, and business intelligence tools.

### Real-Time Transcription
Transcribe meetings, interviews, and conversations with high accuracy using Whisper.

### Multi-Modal Applications
Combine voice input/output with visual interfaces for rich, interactive experiences.

---

## 🔧 Configuration

### Environment Variables

```bash
# Required for OpenAI TTS and STT
OPENAI_API_KEY=your-key-here

# Required for ElevenLabs TTS
ELEVENLABS_API_KEY=your-key-here
```

### Voice Selection

**OpenAI Voices:**
- `alloy`, `ash`, `ballad`, `coral`, `echo`, `fable`, `nova`, `onyx`, `sage`, `shimmer`

**ElevenLabs Voices:**
- Professional: `rachel`, `nicole`, `grace`
- Expressive: `domi`, `elli`, `bella`
- Deep: `antoni`, `josh`, `clyde`
- And 20+ more (see `ELEVENLABS_VOICE_NAMES`)

---

## 🤝 Contributing

Voice-Agents is built by the community, for the community. We welcome contributions!

### How to Contribute

1. **Fork the repository**
2. **Create a feature branch**: `git checkout -b feature/amazing-feature`
3. **Make your changes** and add tests
4. **Commit your changes**: `git commit -m 'Add amazing feature'`
5. **Push to the branch**: `git push origin feature/amazing-feature`
6. **Open a Pull Request**

### Development Setup

```bash
git clone https://github.com/The-Swarm-Corporation/Voice-Agents.git
cd Voice-Agents
pip install -e ".[dev]"
pre-commit install
```

### Code Standards

- Follow PEP 8 style guidelines
- Add type hints to all functions
- Include docstrings for all public APIs
- Write tests for new features

---

## 📄 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

---

## 🙏 Acknowledgments

- Built by [Swarms Corporation](https://swarms.ai)
- Part of the [Swarms](https://github.com/kyegomez/swarms) ecosystem
- Powered by OpenAI, ElevenLabs, and Groq APIs

---

## 📞 Support & Community

- **Documentation**: [GitHub Repository](https://github.com/The-Swarm-Corporation/Voice-Agents)
- **Swarms Documentation**: [docs.swarms.world](https://docs.swarms.world)
- **Swarms Community**: [Discord](https://discord.gg/EamjgSaEQf)
- **Issues**: [GitHub Issues](https://github.com/The-Swarm-Corporation/Voice-Agents/issues)

---

<div align="center">

**Made with ❤️ by [Swarms Corporation](https://swarms.ai)**

[Website](https://swarms.ai) • [Documentation](https://docs.swarms.world) • [GitHub](https://github.com/kyegomez/swarms)

</div>

