Metadata-Version: 2.4
Name: mlx-audio-plus
Version: 0.1.4
Summary: MLX Audio Plus is a package for inference of text-to-speech (TTS) and speech-to-speech (STS) models locally on your Mac using MLX
Home-page: https://github.com/DePasqualeOrg/mlx-audio-plus
Author: Anthony DePasquale
License: MIT
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: misaki[en]>=0.8.2
Requires-Dist: loguru>=0.7.3
Requires-Dist: num2words>=0.5.14
Requires-Dist: spacy>=3.8.4
Requires-Dist: phonemizer-fork>=3.3.2
Requires-Dist: espeakng-loader>=0.2.4
Requires-Dist: mlx>=0.25.2
Requires-Dist: mlx-vlm>=0.1.27
Requires-Dist: numpy>=1.26.4
Requires-Dist: transformers>=4.49.0
Requires-Dist: sentencepiece>=0.2.0
Requires-Dist: huggingface_hub>=0.27.0
Requires-Dist: sounddevice>=0.5.1
Requires-Dist: soundfile>=0.13.1
Requires-Dist: fastapi>=0.95.0
Requires-Dist: uvicorn>=0.22.0
Requires-Dist: einops>=0.8.1
Requires-Dist: tiktoken>=0.9.0
Requires-Dist: tqdm>=4.67.1
Requires-Dist: pyloudnorm>=0.1.1
Requires-Dist: omegaconf==2.3.0
Requires-Dist: einops==0.8.1
Requires-Dist: einx==0.3.0
Requires-Dist: fastrtc[stt,vad]
Requires-Dist: webrtcvad>=2.0.10
Requires-Dist: dacite>=1.9.2
Requires-Dist: pytest-asyncio>=1.0.0
Requires-Dist: mistral-common[audio]
Requires-Dist: hf_transfer
Provides-Extra: py38
Requires-Dist: importlib_resources; extra == "py38"
Dynamic: author
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# MLX Audio Plus

In addition to the models from [Blaizzy/mlx-audio](https://github.com/Blaizzy/mlx-audio), this package includes the following new models ported to MLX in Python:

- TTS
  - [Chatterbox](https://github.com/resemble-ai/chatterbox)
  - [CosyVoice2](https://huggingface.co/FunAudioLLM/CosyVoice2-0.5B)

## Installation

```bash
pip install mlx-audio-plus
```

## Usage

### CLI

```bash
# CosyVoice2: zero-shot mode (reference audio + transcription)
mlx_audio.tts.generate --model mlx-community/CosyVoice2-0.5B-4bit \
    --text "Hello, this is a test of text to speech." \
    --ref_audio reference.wav \
    --ref_text "This is what I said in the reference audio."

# CosyVoice2: cross-lingual mode (no transcription)
mlx_audio.tts.generate --model mlx-community/CosyVoice2-0.5B-4bit \
    --text "Bonjour, comment allez-vous?" \
    --ref_audio reference.wav

# CosyVoice2: instruct mode with style control
mlx_audio.tts.generate --model mlx-community/CosyVoice2-0.5B-4bit \
    --text "I have exciting news!" \
    --ref_audio reference.wav \
    --instruct_text "Speak with excitement and enthusiasm"

# CosyVoice2: voice conversion
mlx_audio.tts.generate --model mlx-community/CosyVoice2-0.5B-4bit \
    --ref_audio target_speaker.wav \
    --source_audio source_speech.wav

# Play audio directly instead of saving
mlx_audio.tts.generate --model mlx-community/CosyVoice2-0.5B-4bit \
    --text "Hello world" \
    --ref_audio reference.wav \
    --play

# Chatterbox: generate speech from reference audio
mlx_audio.tts.generate --model mlx-community/Chatterbox-TTS-4bit \
    --text "The quick brown fox jumped over the lazy dog." \
    --ref_audio reference.wav
```

### Python

```python
from mlx_audio.tts.generate import generate_audio

# CosyVoice2: zero-shot mode (reference audio + transcription)
generate_audio(
    text="Hello, this is a test of text to speech.",
    model="mlx-community/CosyVoice2-0.5B-4bit",
    ref_audio="reference.wav",
    ref_text="This is what I said in the reference audio.",  # Optional
    file_prefix="output",  # Optional
    audio_format="wav",  # Optional
)

# CosyVoice2: cross-lingual mode (no transcription needed)
generate_audio(
    text="Bonjour, comment allez-vous aujourd'hui?",
    model="mlx-community/CosyVoice2-0.5B-4bit",
    ref_audio="reference.wav",
)

# CosyVoice2: instruct mode with style control
generate_audio(
    text="I have some exciting news to share with you!",
    model="mlx-community/CosyVoice2-0.5B-4bit",
    ref_audio="reference.wav",
    instruct_text="Speak with excitement and enthusiasm",
)

# CosyVoice2: voice conversion (convert source audio to target speaker)
generate_audio(
    text="",  # Not used in VC mode
    model="mlx-community/CosyVoice2-0.5B-4bit",
    ref_audio="target_speaker.wav",  # Target voice
    source_audio="source_speech.wav",
)

# Chatterbox: generate speech from reference audio
generate_audio(
    text="The quick brown fox jumped over the lazy dog.",
    model="mlx-community/Chatterbox-TTS-4bit",
    ref_audio="reference.wav",
)
```

