Metadata-Version: 2.4
Name: saltts
Version: 0.1.0
Summary: SalTTS - Real-time, controllable, adaptive neural TTS system for AI VTubers
License: MIT
License-File: LICENSE
Keywords: tts,text-to-speech,neural-tts,vtuber,real-time
Author: SalTTS Team
Requires-Python: >=3.9,<4.0
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Requires-Dist: gymnasium (>=0.28.0)
Requires-Dist: h5py (>=3.8.0)
Requires-Dist: hnswlib (>=0.7.0)
Requires-Dist: hydra-core (>=1.3.0)
Requires-Dist: librosa (>=0.10.0)
Requires-Dist: nmdb-py (>=0.1.1)
Requires-Dist: numpy (>=1.24.0)
Requires-Dist: omegaconf (>=2.3.0)
Requires-Dist: pandas (>=2.0.0)
Requires-Dist: prometheus-client (>=0.17.0)
Requires-Dist: psutil (>=5.9.0)
Requires-Dist: pyworld (>=0.3.0)
Requires-Dist: pyyaml (>=6.0)
Requires-Dist: resampy (>=0.4.2)
Requires-Dist: scipy (>=1.10.0)
Requires-Dist: soundfile (>=0.12.0)
Requires-Dist: stable-baselines3 (>=2.0.0)
Requires-Dist: tensorboard (>=2.12.0)
Requires-Dist: torch (>=2.0.0)
Requires-Dist: torchaudio (>=2.0.0)
Requires-Dist: tqdm (>=4.65.0)
Requires-Dist: transformers (>=4.30.0)
Project-URL: Homepage, https://github.com/yourusername/saltts
Project-URL: Repository, https://github.com/yourusername/saltts
Description-Content-Type: text/markdown

# SalTTS

**Real-time, controllable, adaptive neural TTS system for AI VTubers**

SalTTS is a production-ready text-to-speech system designed for real-time AI VTuber applications with <300ms latency.

## Features

- **Dual Generator Architecture**: Neural (Flow/Diffusion/GAN) + Parametric (WORLD vocoder)
- **Real-time Streaming**: <300ms end-to-end latency with KV caching
- **Adaptive Mixing**: RL-based (Soft Actor-Critic) with HNSW memory
- **Multi-modal Fusion**: 3-channel NMDB integration
- **High Quality**: Target MOS 4.0+, character consistency 0.85+

## Architecture

```
NMDB (3 channels) → Multi-Modal Fusion → Prosody Generation →
Dual Generators (A+B) → RL Adaptive Mixing → Post-Processing → Audio
```

## Installation

### Using Poetry (Recommended)

```bash
poetry install
```

### Using pip

```bash
pip install -e .
```

## Quick Start

```python
from saltts.inference import SalTTSEngine

# Initialize engine
engine = SalTTSEngine(config_dir="config")

# Generate speech
audio = engine.synthesize(
    text="Hello, this is SalTTS!",
    emotion="neutral",
    speaker_id=0
)

# Save audio
engine.save_audio(audio, "output.wav")
```

## Configuration

Configuration files are located in `config/`:
- `model.yaml` - Model architectures
- `runtime.yaml` - Runtime settings
- `training.yaml` - Training hyperparameters
- `nmdb_channels.yaml` - NMDB configuration
- `deployment.yaml` - Deployment settings

## Training

```bash
# Stage 1: Train generators
python scripts/train_stage1.py

# Stage 2: Train transformer
python scripts/train_stage2.py

# Stage 3: Train integration
python scripts/train_stage3.py

# Stage 4: RL training
python scripts/train_stage4_rl.py
```

## Performance

- End-to-end latency: <300ms (target <200ms)
- Real-Time Factor: <0.5
- MOS score: 4.0+
- Character consistency: 0.85+

## License

MIT License - see LICENSE file for details

