Metadata-Version: 2.4
Name: speechflow
Version: 0.1.0
Summary: TTS (Text-to-Speech) wrapper library for Python
Author-email: minamik <mia@sync.dev>
License-Expression: MIT
License-File: LICENSE
Keywords: fishaudio,gemini,kokoro,openai,speech-synthesis,style-bert-vits2,text-to-speech,tts
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: en-core-web-sm
Requires-Dist: fish-audio-sdk>=2025.6.3
Requires-Dist: google-genai>=1.18.0
Requires-Dist: kokoro>=0.9.4
Requires-Dist: misaki[ja]>=0.9.4
Requires-Dist: numba>=0.61
Requires-Dist: numpy>=1.26.4
Requires-Dist: openai>=1.84.0
Requires-Dist: pip>=25.2
Requires-Dist: pyaudio>=0.2.14
Requires-Dist: pydantic>=2.0
Requires-Dist: pyopenjtalk>=0.4.1
Requires-Dist: python-dotenv>=1.1.1
Requires-Dist: style-bert-vits2>=2.5.0
Requires-Dist: torch
Requires-Dist: torchaudio
Requires-Dist: torchvision
Description-Content-Type: text/markdown

# SpeechFlow

A unified Python TTS (Text-to-Speech) library that provides a simple interface for multiple TTS engines.

## Features

- **Multiple TTS Engine Support**:
  - OpenAI TTS
  - Google Gemini TTS
  - FishAudio TTS (Cloud-based, multi-voice)
  - Kokoro TTS (Multi-language, lightweight, local)
  - Style-Bert-VITS2 (Local, high-quality Japanese TTS)

- **Unified Interface**: Switch between different TTS engines without changing your code
- **Streaming Support**: Real-time audio streaming for supported engines
- **Decoupled Architecture**: Use TTS engines, audio players, and file writers independently
- **Audio Playback**: Synchronous audio player with streaming support
- **File Export**: Save synthesized speech to various audio formats

## Installation

```bash
pip install speechflow
```

For Style-Bert-VITS2 support:
```bash
# Make sure numba>=0.61 is installed first for Python 3.12 compatibility
pip install numba>=0.61
pip install style-bert-vits2>=2.5.0
```

## Quick Start

### Basic Usage (Decoupled Components)
```python
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter

# Initialize components
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()

# Generate audio
audio = engine.get("Hello, world!")

# Play audio
player.play(audio)

# Save to file
writer.save(audio, "output.wav")
```

### Streaming Audio

**Important Notes on Streaming Behavior:**
- **OpenAI**: True streaming with multiple chunks. First call may have 10-20s cold start delay. Uses PCM format for simplicity.
- **Gemini**: Returns complete audio in a single chunk (as of January 2025). This is a known limitation, not true streaming.

```python
from speechflow import OpenAITTSEngine, AudioPlayer, AudioWriter

# Initialize components
engine = OpenAITTSEngine(api_key="your-api-key")
player = AudioPlayer()
writer = AudioWriter()

# Warmup for OpenAI (recommended for production)
_ = list(engine.stream("Warmup"))

# Stream and play audio (returns combined AudioData)
combined_audio = player.play_stream(engine.stream("This is a long text that will be streamed..."))

# Save the combined audio to file
writer.save(combined_audio, "output.wav")
```

## Engine-Specific Features

### OpenAI TTS
```python
from speechflow import OpenAITTSEngine

engine = OpenAITTSEngine(api_key="your-api-key")
audio = engine.get(
    "Hello",
    voice="alloy",  # or: ash, ballad, coral, echo, fable, nova, onyx, sage, shimmer
    model="gpt-4o-mini-tts",   # or: tts-1, tts-1-hd
    speed=1.0
)

# Streaming
for chunk in engine.stream("Long text..."):
    # Process audio chunks in real-time
    pass
```

### Google Gemini TTS
```python
from speechflow import GeminiTTSEngine

engine = GeminiTTSEngine(api_key="your-api-key")
audio = engine.get(
    "Hello",
    model="gemini-2.5-flash-preview-tts",  # or: gemini-2.5-pro-preview-tts
    voice="Leda",  # or: Puck, Charon, Kore, Fenrir, Aoede, and many more
    speed=1.0
)
```

### FishAudio TTS
```python
from speechflow import FishAudioTTSEngine

engine = FishAudioTTSEngine(api_key="your-api-key")
audio = engine.get(
    "Hello world",
    model="s1",  # or: s1-mini, speech-1.6, speech-1.5, agent-x0
    voice="your-voice-id"  # Use your FishAudio voice ID
)

# Streaming
for chunk in engine.stream("Streaming text..."):
    # Process audio chunks
    pass
```

### Kokoro TTS
```python
from speechflow import KokoroTTSEngine

# Default: American English
engine = KokoroTTSEngine()
audio = engine.get(
    "Hello world",
    voice="af_heart"  # Multiple voices available
)

# Japanese (requires additional setup)
engine = KokoroTTSEngine(lang_code="j")
audio = engine.get(
    "こんにちは、世界",
    voice="af_heart"
)
```

**Note for Japanese support:**
The Japanese dictionary will be automatically downloaded on first use.
If you encounter errors, you can manually download it:
```bash
python -m unidic download
```

### Style-Bert-VITS2
```python
from speechflow import StyleBertTTSEngine

# Use pre-trained model (automatically downloads on first use)
engine = StyleBertTTSEngine(model_name="jvnv-F1-jp")  # Female Japanese voice
audio = engine.get(
    "こんにちは、世界",
    style="Happy",       # Emotion: Neutral, Happy, Sad, Angry, Fear, Surprise, Disgust
    style_weight=5.0,    # Emotion strength (0.0-10.0)
    speed=1.0,          # Speech speed
    pitch=0.0           # Pitch shift in semitones
)

# Available pre-trained models:
# - jvnv-F1-jp, jvnv-F2-jp: Female voices (JP-Extra version)
# - jvnv-M1-jp, jvnv-M2-jp: Male voices (JP-Extra version)  
# - jvnv-F1, jvnv-F2, jvnv-M1, jvnv-M2: Legacy versions

# Use custom model
engine = StyleBertTTSEngine(model_path="/path/to/your/model")

# Sentence-by-sentence streaming (not true streaming)
for audio_chunk in engine.stream("長い文章を文ごとに生成します。"):
    # Process each sentence's audio
    pass
```

**Note:** Style-Bert-VITS2 is optimized for Japanese text and requires GPU for best performance.

## Language Support

### Kokoro Languages
- 🇺🇸 American English (`a`)
- 🇬🇧 British English (`b`)
- 🇪🇸 Spanish (`e`)
- 🇫🇷 French (`f`)
- 🇮🇳 Hindi (`h`)
- 🇮🇹 Italian (`i`)
- 🇯🇵 Japanese (`j`) - requires unidic
- 🇧🇷 Brazilian Portuguese (`p`)
- 🇨🇳 Mandarin Chinese (`z`)

## License

MIT