Metadata-Version: 2.4
Name: pocket-tts-mlx
Version: 0.2.1
Summary: MLX backend for pocket-tts with Apple Silicon optimization
Author: jishnuvenugopal
License: MIT
Project-URL: Homepage, https://github.com/jishnuvenugopal/pocket-tts-mlx
Project-URL: Repository, https://github.com/jishnuvenugopal/pocket-tts-mlx
Project-URL: Issues, https://github.com/jishnuvenugopal/pocket-tts-mlx/issues
Keywords: tts,text-to-speech,mlx,apple-silicon,voice-cloning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mlx>=0.20.0
Requires-Dist: numpy
Requires-Dist: safetensors
Requires-Dist: sentencepiece>=0.2.1
Requires-Dist: pydantic>=2
Requires-Dist: pyyaml>=6.0
Requires-Dist: requests>=2.20.0
Requires-Dist: huggingface_hub>=0.10
Requires-Dist: scipy>=1.5.0
Requires-Dist: soundfile>=0.12.0
Requires-Dist: typing-extensions
Provides-Extra: dev
Requires-Dist: torch>=2.5.0; extra == "dev"
Requires-Dist: pocket-tts>=1.0.3; extra == "dev"
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-xdist>=3.0; extra == "dev"
Dynamic: license-file

# pocket-tts-mlx

MLX backend for [pocket-tts](https://github.com/kyutai-labs/pocket-tts) optimized for Apple Silicon.

Runtime is torch-free. Torch is only required for optional parity tests.

**Installation**

PyPI install:

```bash
pip install pocket-tts-mlx
```

Local development:

```bash
pip install -e .
```

Model weights are downloaded from Hugging Face on first run. For voice cloning
weights, accept the model terms and authenticate:

```bash
hf auth login
```

**Quickstart**

```python
from pocket_tts_mlx import TTSModel

model = TTSModel.load_model()
state = model.get_state_for_audio_prompt("marius")
audio = model.generate_audio(
    state,
    "Hello from MLX!",
    max_tokens=200,
    warmup_frames=1,
    trim_start_ms=40,
    fade_in_ms=15,
)
```

**CLI**

Basic usage:

```bash
pocket-tts-mlx "Hello, world!" --voice marius --output output.wav
```

Cleaner onset (recommended if startup artifacts are audible):

```bash
pocket-tts-mlx "Hello, world!" --voice marius --output output.wav --warmup-frames 1 --trim-start-ms 40 --fade-in-ms 15
```

**Onset Cleanup Options**

- `--warmup-frames`: decode and discard initial Mimi frames to reduce decoder startup transients.
- `--trim-start-ms`: trim milliseconds from start of output.
- `--fade-in-ms`: apply linear fade-in at start.

Equivalent Python args are `warmup_frames`, `trim_start_ms`, and `fade_in_ms`.

**Performance Note**

`generate_audio()` now materializes generated chunks before returning, so `np.array(audio)` overhead should be near zero in normal usage.

**Voices**

Predefined voices:

- alba
- marius
- javert
- jean
- fantine
- cosette
- eponine
- azelma

**Requirements**

- Python 3.10+
- Apple Silicon Mac (M1/M2/M3/M4)
- MLX
- Internet access for initial model downloads

**Notes**

- Voice cloning requires Hugging Face access to `kyutai/pocket-tts`.
- Non-voice-cloning weights are used automatically when voice cloning is unavailable.
