Metadata-Version: 2.4
Name: pykokoro
Version: 0.4.0
Summary: A python library for Kokoro TTS
Author-email: Holger Nahrstaedt <nahrstaedt@gmail.com>
License:                                  Apache License
                                   Version 2.0, January 2004
                                http://www.apache.org/licenses/
        
           TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
        
           1. Definitions.
        
              "License" shall mean the terms and conditions for use, reproduction,
              and distribution as defined by Sections 1 through 9 of this document.
        
              "Licensor" shall mean the copyright owner or entity authorized by
              the copyright owner that is granting the License.
        
              "Legal Entity" shall mean the union of the acting entity and all
              other entities that control, are controlled by, or are under common
              control with that entity. For the purposes of this definition,
              "control" means (i) the power, direct or indirect, to cause the
              direction or management of such entity, whether by contract or
              otherwise, or (ii) ownership of fifty percent (50%) or more of the
              outstanding shares, or (iii) beneficial ownership of such entity.
        
              "You" (or "Your") shall mean an individual or Legal Entity
              exercising permissions granted by this License.
        
              "Source" form shall mean the preferred form for making modifications,
              including but not limited to software source code, documentation
              source, and configuration files.
        
              "Object" form shall mean any form resulting from mechanical
              transformation or translation of a Source form, including but
              not limited to compiled object code, generated documentation,
              and conversions to other media types.
        
              "Work" shall mean the work of authorship, whether in Source or
              Object form, made available under the License, as indicated by a
              copyright notice that is included in or attached to the work
              (an example is provided in the Appendix below).
        
              "Derivative Works" shall mean any work, whether in Source or Object
              form, that is based on (or derived from) the Work and for which the
              editorial revisions, annotations, elaborations, or other modifications
              represent, as a whole, an original work of authorship. For the purposes
              of this License, Derivative Works shall not include works that remain
              separable from, or merely link (or bind by name) to the interfaces of,
              the Work and Derivative Works thereof.
        
              "Contribution" shall mean any work of authorship, including
              the original version of the Work and any modifications or additions
              to that Work or Derivative Works thereof, that is intentionally
              submitted to Licensor for inclusion in the Work by the copyright owner
              or by an individual or Legal Entity authorized to submit on behalf of
              the copyright owner. For the purposes of this definition, "submitted"
              means any form of electronic, verbal, or written communication sent
              to the Licensor or its representatives, including but not limited to
              communication on electronic mailing lists, source code control systems,
              and issue tracking systems that are managed by, or on behalf of, the
              Licensor for the purpose of discussing and improving the Work, but
              excluding communication that is conspicuously marked or otherwise
              designated in writing by the copyright owner as "Not a Contribution."
        
              "Contributor" shall mean Licensor and any individual or Legal Entity
              on behalf of whom a Contribution has been received by Licensor and
              subsequently incorporated within the Work.
        
           2. Grant of Copyright License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              copyright license to reproduce, prepare Derivative Works of,
              publicly display, publicly perform, sublicense, and distribute the
              Work and such Derivative Works in Source or Object form.
        
           3. Grant of Patent License. Subject to the terms and conditions of
              this License, each Contributor hereby grants to You a perpetual,
              worldwide, non-exclusive, no-charge, royalty-free, irrevocable
              (except as stated in this section) patent license to make, have made,
              use, offer to sell, sell, import, and otherwise transfer the Work,
              where such license applies only to those patent claims licensable
              by such Contributor that are necessarily infringed by their
              Contribution(s) alone or by combination of their Contribution(s)
              with the Work to which such Contribution(s) was submitted. If You
              institute patent litigation against any entity (including a
              cross-claim or counterclaim in a lawsuit) alleging that the Work
              or a Contribution incorporated within the Work constitutes direct
              or contributory patent infringement, then any patent licenses
              granted to You under this License for that Work shall terminate
              as of the date such litigation is filed.
        
           4. Redistribution. You may reproduce and distribute copies of the
              Work or Derivative Works thereof in any medium, with or without
              modifications, and in Source or Object form, provided that You
              meet the following conditions:
        
              (a) You must give any other recipients of the Work or
                  Derivative Works a copy of this License; and
        
              (b) You must cause any modified files to carry prominent notices
                  stating that You changed the files; and
        
              (c) You must retain, in the Source form of any Derivative Works
                  that You distribute, all copyright, patent, trademark, and
                  attribution notices from the Source form of the Work,
                  excluding those notices that do not pertain to any part of
                  the Derivative Works; and
        
              (d) If the Work includes a "NOTICE" text file as part of its
                  distribution, then any Derivative Works that You distribute must
                  include a readable copy of the attribution notices contained
                  within such NOTICE file, excluding those notices that do not
                  pertain to any part of the Derivative Works, in at least one
                  of the following places: within a NOTICE text file distributed
                  as part of the Derivative Works; within the Source form or
                  documentation, if provided along with the Derivative Works; or,
                  within a display generated by the Derivative Works, if and
                  wherever such third-party notices normally appear. The contents
                  of the NOTICE file are for informational purposes only and
                  do not modify the License. You may add Your own attribution
                  notices within Derivative Works that You distribute, alongside
                  or as an addendum to the NOTICE text from the Work, provided
                  that such additional attribution notices cannot be construed
                  as modifying the License.
        
              You may add Your own copyright statement to Your modifications and
              may provide additional or different license terms and conditions
              for use, reproduction, or distribution of Your modifications, or
              for any such Derivative Works as a whole, provided Your use,
              reproduction, and distribution of the Work otherwise complies with
              the conditions stated in this License.
        
           5. Submission of Contributions. Unless You explicitly state otherwise,
              any Contribution intentionally submitted for inclusion in the Work
              by You to the Licensor shall be under the terms and conditions of
              this License, without any additional terms or conditions.
              Notwithstanding the above, nothing herein shall supersede or modify
              the terms of any separate license agreement you may have executed
              with Licensor regarding such Contributions.
        
           6. Trademarks. This License does not grant permission to use the trade
              names, trademarks, service marks, or product names of the Licensor,
              except as required for reasonable and customary use in describing the
              origin of the Work and reproducing the content of the NOTICE file.
        
           7. Disclaimer of Warranty. Unless required by applicable law or
              agreed to in writing, Licensor provides the Work (and each
              Contributor provides its Contributions) on an "AS IS" BASIS,
              WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
              implied, including, without limitation, any warranties or conditions
              of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
              PARTICULAR PURPOSE. You are solely responsible for determining the
              appropriateness of using or redistributing the Work and assume any
              risks associated with Your exercise of permissions under this License.
        
           8. Limitation of Liability. In no event and under no legal theory,
              whether in tort (including negligence), contract, or otherwise,
              unless required by applicable law (such as deliberate and grossly
              negligent acts) or agreed to in writing, shall any Contributor be
              liable to You for damages, including any direct, indirect, special,
              incidental, or consequential damages of any character arising as a
              result of this License or out of the use or inability to use the
              Work (including but not limited to damages for loss of goodwill,
              work stoppage, computer failure or malfunction, or any and all
              other commercial damages or losses), even if such Contributor
              has been advised of the possibility of such damages.
        
           9. Accepting Warranty or Additional Liability. While redistributing
              the Work or Derivative Works thereof, You may choose to offer,
              and charge a fee for, acceptance of support, warranty, indemnity,
              or other liability obligations and/or rights consistent with this
              License. However, in accepting such obligations, You may act only
              on Your own behalf and on Your sole responsibility, not on behalf
              of any other Contributor, and only if You agree to indemnify,
              defend, and hold each Contributor harmless for any liability
              incurred by, or claims asserted against, such Contributor by reason
              of your accepting any such warranty or additional liability.
        
           END OF TERMS AND CONDITIONS
        
           APPENDIX: How to apply the Apache License to your work.
        
              To apply the Apache License to your work, attach the following
              boilerplate notice, with the fields enclosed by brackets "[]"
              replaced with your own identifying information. (Don't include
              the brackets!)  The text should be enclosed in the appropriate
              comment syntax for the file format. We also recommend that a
              file or class name and description of purpose be included on the
              same "printed page" as the copyright notice for easier
              identification within third-party archives.
        
           Copyright [yyyy] [name of copyright owner]
        
           Licensed under the Apache License, Version 2.0 (the "License");
           you may not use this file except in compliance with the License.
           You may obtain a copy of the License at
        
               http://www.apache.org/licenses/LICENSE-2.0
        
           Unless required by applicable law or agreed to in writing, software
           distributed under the License is distributed on an "AS IS" BASIS,
           WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
           See the License for the specific language governing permissions and
           limitations under the License.
        
Project-URL: Homepage, https://github.com/holgern/pykokoro
Keywords: tts,text-to-speech,kokoro,phonemizer,espeak
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Linguistic
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: kokorog2p>=0.4.0
Requires-Dist: phrasplit
Requires-Dist: numpy
Requires-Dist: onnxruntime>=1.17.0
Requires-Dist: platformdirs
Requires-Dist: chardet
Requires-Dist: charset-normalizer
Requires-Dist: huggingface-hub
Requires-Dist: ssmd>=0.6.0
Requires-Dist: num2words>=0.5.12
Requires-Dist: babel>=2.12.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: prosody
Requires-Dist: audiomentations>=0.36.0; extra == "prosody"
Requires-Dist: librosa>=0.10.0; extra == "prosody"
Provides-Extra: gpu
Requires-Dist: onnxruntime-gpu>=1.17.0; extra == "gpu"
Provides-Extra: openvino
Requires-Dist: onnxruntime-openvino>=1.17.0; extra == "openvino"
Provides-Extra: directml
Requires-Dist: onnxruntime-directml>=1.17.0; extra == "directml"
Provides-Extra: coreml
Requires-Dist: coremltools>=7.0; extra == "coreml"
Provides-Extra: all
Requires-Dist: onnxruntime-gpu>=1.17.0; extra == "all"
Requires-Dist: onnxruntime-openvino>=1.17.0; extra == "all"
Requires-Dist: onnxruntime-directml>=1.17.0; extra == "all"
Requires-Dist: coremltools>=7.0; extra == "all"
Requires-Dist: audiomentations>=0.36.0; extra == "all"
Requires-Dist: librosa>=0.10.0; extra == "all"
Dynamic: license-file

[![PyPI - Version](https://img.shields.io/pypi/v/pykokoro)](https://pypi.org/project/pykokoro/)
![PyPI - Python Version](https://img.shields.io/pypi/pyversions/pykokoro)
![PyPI - Downloads](https://img.shields.io/pypi/dm/pykokoro)
[![codecov](https://codecov.io/gh/holgern/pykokoro/graph/badge.svg?token=iCHXwbjAXG)](https://codecov.io/gh/holgern/pykokoro)

# PyKokoro

A Python library for Kokoro TTS (Text-to-Speech) using ONNX runtime.

## Features

- **ONNX-based TTS**: Fast, efficient text-to-speech using the Kokoro-82M model
- **Multiple Languages**: Support for English, Spanish, French, German, Italian,
  Portuguese, and more
- **Multiple Voices**: 54+ built-in voices (or 103 voices with v1.1-zh model)
- **Voice Blending**: Create custom voices by blending multiple voices
- **Multiple Model Sources**: Download models from HuggingFace or GitHub (v1.0/v1.1-zh)
- **Model Quality Options**: Choose from fp32, fp16, q8, q4, and uint8 quantization
  levels
- **GPU Acceleration**: Optional CUDA, CoreML, or DirectML support
- **Phoneme Support**: Advanced phoneme-based generation with kokorog2p
- **Hugging Face Integration**: Automatic model downloading from Hugging Face Hub
- **Text Normalization**: Automatic say-as support for numbers, dates, phone numbers,
  and more using SSMD markup

## Installation

### Basic Installation (CPU only)

```bash
pip install pykokoro
```

### GPU and Accelerator Support

PyKokoro supports multiple hardware accelerators for faster inference:

#### NVIDIA CUDA GPU

```bash
pip install pykokoro[gpu]
```

#### Intel OpenVINO

**Note:** OpenVINO is currently incompatible with Kokoro models due to dynamic rank
tensor requirements. The provider will automatically fall back to CPU if OpenVINO fails.

```bash
pip install pykokoro[openvino]
```

#### DirectML (Windows - AMD/Intel/NVIDIA GPUs)

```bash
pip install pykokoro[directml]
```

#### Apple CoreML (macOS)

```bash
pip install pykokoro[coreml]
```

#### All Accelerators

```bash
pip install pykokoro[all]
```

### Performance Comparison

To find the best provider for your system, run the benchmark:

```bash
python examples/gpu_benchmark.py
```

## Quick Start

```python
import pykokoro
import soundfile as sf

# Initialize the TTS engine (auto-selects best provider)
tts = pykokoro.Kokoro(provider="auto")

# Generate speech
text = "Hello, world! This is Kokoro speaking."
audio, sample_rate = tts.create(text, voice="af_sarah", speed=1.0, lang="en-us")

# Save to file
sf.write("output.wav", audio, sample_rate)
```

## Hardware Acceleration

### Automatic Provider Selection (Recommended)

```python
import pykokoro

# Auto-select best available provider (CUDA > CoreML > DirectML > CPU)
# Note: OpenVINO is attempted but will fall back to next priority if incompatible
tts = pykokoro.Kokoro(provider="auto")
```

### Explicit Provider Selection

```python
# Force specific provider
tts = pykokoro.Kokoro(provider="cuda")      # NVIDIA CUDA
tts = pykokoro.Kokoro(provider="openvino")  # Intel OpenVINO (currently incompatible, will raise error)
tts = pykokoro.Kokoro(provider="directml")  # Windows DirectML
tts = pykokoro.Kokoro(provider="coreml")    # Apple CoreML
tts = pykokoro.Kokoro(provider="cpu")       # CPU only
```

### Check Available Providers

```bash
# See all available providers on your system
python examples/provider_info.py

# Benchmark all providers
python examples/gpu_benchmark.py
```

### Environment Variable Override

```bash
# Force a specific provider via environment variable
export ONNX_PROVIDER="OpenVINOExecutionProvider"
python your_script.py
```

## Usage Examples

### Basic Text-to-Speech

```python
import pykokoro

# Create TTS instance with GPU acceleration and fp16 model
tts = pykokoro.Kokoro(provider="cuda", model_quality="fp16")

# Generate audio
audio, sr = tts.create("Hello world", voice="af_nicole", lang="en-us")
```

### Voice Blending

```python
# Blend two voices (50% each)
blend = pykokoro.VoiceBlend.parse("af_nicole:50,am_michael:50")
audio, sr = tts.create("Mixed voice", voice=blend)
```

### Streaming Generation

```python
# Synchronous streaming
for chunk, sr, text_chunk in tts.create_stream_sync("Long text here...", voice="af_sarah"):
    # Process audio chunk in real-time
    play_audio(chunk, sr)

# Async streaming
async for chunk, sr, text_chunk in tts.create_stream("Long text here...", voice="af_sarah"):
    await process_audio(chunk, sr)
```

### Phoneme-Based Generation

```python
from pykokoro import Tokenizer

# Create tokenizer
tokenizer = Tokenizer()

# Convert text to phonemes
phonemes = tokenizer.phonemize("Hello world", lang="en-us")
print(phonemes)  # hə'loʊ wɜːld

# Generate from phonemes
audio, sr = tts.create_from_phonemes(phonemes, voice="af_sarah")
```

### Pause Control

PyKokoro uses SSMD (Speech Synthesis Markdown) syntax for controlling pauses in
generated speech:

#### 1. SSMD Break Markers

Add explicit pauses using SSMD break syntax in your text:

```python
# Use SSMD break markers in your text
text = "Chapter 5 ...p I'm Klaus. ...c Welcome to the show!"

# Breaks are processed automatically
audio, sr = tts.create(
    text,
    voice="am_michael"
)
```

**SSMD Break Markers:**

- `...n` - No pause (0ms)
- `...w` - Weak pause (150ms by default)
- `...c` - Clause/comma pause (300ms by default)
- `...s` - Sentence pause (600ms by default)
- `...p` - Paragraph pause (1000ms by default)
- `...500ms` - Custom pause (500 milliseconds)
- `...2s` - Custom pause (2 seconds)

**Note:** Bare `...` (ellipsis) is NOT treated as a pause and will be phonemized
normally.

**Custom Pause Durations:**

```python
audio, sr = tts.create(
    text,
    voice="am_michael",
    pause_clause=0.2,      # ...c = 200ms
    pause_sentence=0.5,    # ...s = 500ms
    pause_paragraph=1.5    # ...p = 1500ms
)
```

#### 2. Automatic Natural Pauses

For more natural speech, enable automatic pause insertion at linguistic boundaries:

```python
text = """
Artificial intelligence is transforming our world. Machine learning models
are becoming more sophisticated, efficient, and accessible.

Deep learning, a subset of AI, uses neural networks with many layers. These
networks can learn complex patterns from data, enabling breakthroughs in
computer vision, natural language processing, and speech recognition.
"""

# Automatic pauses at clause, sentence, and paragraph boundaries
audio, sr = tts.create(
    text,
    voice="af_sarah",
    split_mode="clause",      # Split on commas and sentences
    trim_silence=True,        # Enable automatic pause insertion
    pause_clause=0.25,        # Pause after clauses (commas)
    pause_sentence=0.5,       # Pause after sentences
    pause_paragraph=1.0,      # Pause after paragraphs
    pause_variance=0.05,      # Add natural variance (default)
    random_seed=42            # For reproducible results (optional)
)
```

**Key Features:**

- **Natural boundaries**: Automatically detects clauses, sentences, and paragraphs
- **Variance**: Gaussian variance prevents robotic timing (±100ms by default)
- **Reproducible**: Use `random_seed` for consistent output
- **Composable**: Works with SSMD break markers

**Split Modes:**

- `None` (default) - Automatic phoneme-based splitting, no automatic pauses
- `"paragraph"` - Split on double newlines
- `"sentence"` - Split on sentence boundaries (requires spaCy)
- `"clause"` - Split on sentences + commas (requires spaCy, recommended)

**Pause Variance Options:**

- `pause_variance=0.0` - No variance (exact pauses)
- `pause_variance=0.05` - Default (±100ms at 95% confidence)
- `pause_variance=0.1` - More variation (±200ms at 95% confidence)

**Note:** For `split_mode="sentence"` or `split_mode="clause"`, install spaCy:

```bash
pip install spacy
python -m spacy download en_core_web_sm
```

**Combining Both Approaches:**

Use SSMD markers for special emphasis and automatic pauses for natural rhythm:

```python
text = "Welcome! ...p Let's discuss AI, machine learning, and deep learning."

audio, sr = tts.create(
    text,
    voice="af_sarah",
    split_mode="clause",     # Automatic pauses at commas
    trim_silence=True,
    pause_variance=0.05
)
```

See `examples/pauses_demo.py`, `examples/pauses_with_splitting.py`, and
`examples/automatic_pauses_demo.py` for complete examples.

### Text Normalization (Say-As)

PyKokoro supports automatic text normalization using SSMD (Speech Synthesis Markdown)
syntax. Convert numbers, dates, phone numbers, and more into speakable text:

```python
# Cardinal numbers
text = "I have [123](as: cardinal) apples"
audio, sr = tts.create(text, voice="af_sarah")
# TTS says: "I have one hundred twenty-three apples"

# Ordinal numbers
text = "I came in [3](as: ordinal) place"
# TTS says: "I came in third place"

# Digits (spell out)
text = "My PIN is [1234](as: digits)"
# TTS says: "My PIN is one two three four"

# Telephone numbers
text = "Call [+1-555-0123](as: telephone)"
# TTS says: "Call plus one five five five oh one two three"

# Dates with custom formatting
text = "Today is [12/31/2024](as: date, format: mdy)"
# TTS says: "Today is December thirty-first, two thousand twenty-four"

# Time (12-hour or 24-hour)
text = "The time is [14:30](as: time)"
# TTS says: "The time is two thirty PM"

# Characters (spell out)
text = "The code is [ABC](as: characters)"
# TTS says: "The code is A B C"

# Fractions
text = "Add [1/2](as: fraction) cup of sugar"
# TTS says: "Add one half cup of sugar"

# Units
text = "The package weighs [5kg](as: unit)"
# TTS says: "The package weighs five kilograms"
```

**Supported Say-As Types:**

- `cardinal` - Numbers as cardinals: "123" → "one hundred twenty-three"
- `ordinal` - Numbers as ordinals: "3" → "third"
- `digits` - Spell out digits: "123" → "one two three"
- `number` - Alias for cardinal
- `fraction` - Fractions: "1/2" → "one half"
- `characters` - Spell out text: "ABC" → "A B C"
- `telephone` - Phone numbers: "+1-555-0123" → "plus one five five five oh one two
  three"
- `date` - Dates with format support (mdy, dmy, ymd, ym, my, md, dm, d, m, y)
- `time` - Time in 12h or 24h format
- `unit` - Units: "5kg" → "five kilograms"
- `expletive` - Censors to "beep"

**Multi-language Support:**

Say-as works with multiple languages (English, French, German, Spanish, and more):

```python
# French cardinal
text = "[123](as: cardinal)"
audio, sr = tts.create(text, voice="ff_siwis", lang="fr-fr")
# TTS says: "cent vingt-trois"

# German ordinal
text = "[3](as: ordinal)"
audio, sr = tts.create(text, voice="gf_maria", lang="de-de")
# TTS says: "dritte"
```

**Combining with Other Features:**

Say-as works seamlessly with all SSMD features:

```python
# With prosody
text = "[100](as: cardinal) +loud+ dollars!"

# With pauses
text = "[First](as: ordinal) ...c [second](as: ordinal) ...c [third](as: ordinal)!"

# With emphasis
text = "The winner is *[1](as: ordinal)*!"
```

See `examples/say_as_demo.py` for comprehensive examples.

#### 4. Automatic Short Sentence Handling

When processing text, very short sentences (like "Why?" or "Go!") can produce poor audio
quality when processed individually (only 3-8 phonemes each). Pykokoro automatically
handles this using a "repeat-and-cut" technique:

**How It Works:**

1. Short segments are detected based on phoneme length (default: <30 phonemes)
2. The sentence is repeated: "Why?" → "Why? Why? Why?"
3. TTS generates audio with more context (better prosody)
4. Audio is trimmed to extract only the first instance

This happens automatically during `kokoro.create()` - no configuration needed!

**Customizing the Behavior:**

You can customize the thresholds using `ShortSentenceConfig`:

```python
from pykokoro import Kokoro
from pykokoro.short_sentence_handler import ShortSentenceConfig

# More aggressive short sentence handling
config = ShortSentenceConfig(
    min_phoneme_length=50,    # Treat segments <50 phonemes as short
    target_phoneme_length=150, # Repeat until ~150 phonemes
    max_repetitions=7,         # Allow up to 7 repetitions
)

tts = Kokoro(short_sentence_config=config)
```

**Default Configuration:**

- `min_phoneme_length=30`: Segments below this use repeat-and-cut
- `target_phoneme_length=100`: Target length for repeated text
- `max_repetitions=5`: Maximum times to repeat

**Disabling Short Sentence Handling:**

```python
config = ShortSentenceConfig(min_phoneme_length=0)  # No segment is "short"
tts = Kokoro(short_sentence_config=config)
```

See `examples/optimal_phoneme_length_demo.py` for a demonstration.

## Available Voices

The library includes voices across different languages and accents. The number of
available voices depends on the model source:

### HuggingFace & GitHub v1.0 (54 voices)

- **American English**: af_alloy, af_bella, af_sarah, am_adam, am_michael, etc.
- **British English**: bf_alice, bf_emma, bm_george, bm_lewis
- **Spanish**: ef_dora, em_alex
- **French**: ff_siwis
- **Japanese**: jf_alpha, jm_kumo
- **Chinese**: zf_xiaobei, zm_yunxi
- And many more...

### GitHub v1.1-zh (103 voices)

Includes all voices from v1.0 plus additional Chinese voices:

- **English voices**: af_maple, af_sol, bf_vale (confirmed working)
- **Chinese voices**: zf_001 through zf_099, zm_009 through zm_100

**Example - Using v1.1-zh with English:**

```python
tts = pykokoro.Kokoro(model_source="github", model_variant="v1.1-zh")
audio, sr = tts.create("Hello world!", voice="af_maple", lang="en-us")
```

List all available voices:

```python
voices = tts.get_voices()
print(voices)
```

## Model Sources

PyKokoro supports downloading models from multiple sources:

### HuggingFace (Default)

The default source with 54 multi-language voices:

```python
tts = pykokoro.Kokoro(
    model_source="huggingface",
    model_quality="fp32"  # fp32, fp16, q8, q8f16, q4, q4f16, uint8, uint8f16
)
```

### GitHub v1.0

54 voices with additional `fp16-gpu` optimized quality:

```python
tts = pykokoro.Kokoro(
    model_source="github",
    model_variant="v1.0",
    model_quality="fp16-gpu"  # fp32, fp16, fp16-gpu, q8
)
```

### GitHub v1.1-zh (English + Chinese)

103 voices including English and Chinese speakers:

```python
tts = pykokoro.Kokoro(
    model_source="github",
    model_variant="v1.1-zh",
    model_quality="fp32"  # Only fp32 available
)

# Use English voices from v1.1-zh
voices = tts.get_voices()  # Returns 103 voices
audio, sr = tts.create("Hello world", voice="af_maple", lang="en-us")
```

**Note:** Chinese text generation requires proper phonemization support (currently in
development).

## Model Quality Options

Available quality options vary by source:

**HuggingFace Models:**

- `fp32`: Full precision (highest quality, largest size)
- `fp16`: Half precision (good quality, smaller size)
- `q8`: 8-bit quantized (fast, small)
- `q8f16`: 8-bit with fp16 (balanced)
- `q4`: 4-bit quantized (fastest, smallest)
- `q4f16`: 4-bit with fp16 (compact)
- `uint8`: Unsigned 8-bit (compatible)
- `uint8f16`: Unsigned 8-bit with fp16

**GitHub v1.0 Models:**

- `fp32`: Full precision
- `fp16`: Half precision
- `fp16-gpu`: GPU-optimized fp16
- `q8`: 8-bit quantized

**GitHub v1.1-zh Models:**

- `fp32`: Full precision only

```python
# HuggingFace with q8
tts = pykokoro.Kokoro(model_source="huggingface", model_quality="q8")

# GitHub v1.0 with GPU-optimized fp16
tts = pykokoro.Kokoro(model_source="github", model_variant="v1.0", model_quality="fp16-gpu")
```

## Configuration

Configuration is stored in a platform-specific directory:

- Linux: `~/.config/pykokoro/config.json`
- macOS: `~/Library/Application Support/pykokoro/config.json`
- Windows: `%APPDATA%\pykokoro\config.json`

```python
import pykokoro

# Load config
config = pykokoro.load_config()

# Modify config
config["model_quality"] = "fp16"
config["use_gpu"] = True

# Save config
pykokoro.save_config(config)
```

## Advanced Features

### Custom Phoneme Dictionary

```python
from pykokoro import Tokenizer, TokenizerConfig

# Create config with custom phoneme dictionary
config = TokenizerConfig(
    phoneme_dictionary_path="my_pronunciations.json"
)

tokenizer = Tokenizer(config=config)
```

### Mixed Language Support

```python
from pykokoro import TokenizerConfig

config = TokenizerConfig(
    use_mixed_language=True,
    mixed_language_primary="en-us",
    mixed_language_allowed=["en-us", "de", "fr"]
)

tokenizer = Tokenizer(config=config)
```

### Backend Configuration

Control which phonemization backend and dictionaries to use:

```python
from pykokoro import TokenizerConfig

# Default: Full dictionaries with espeak fallback (best quality)
config = TokenizerConfig(
    backend="espeak",
    load_gold=True,
    load_silver=True,
    use_espeak_fallback=True
)

# Memory-optimized: Gold dictionary only
config = TokenizerConfig(
    backend="espeak",
    load_gold=True,
    load_silver=False,  # Saves ~22-31 MB
    use_espeak_fallback=True
)

# Fastest initialization: Pure espeak
config = TokenizerConfig(
    backend="espeak",
    load_gold=False,
    load_silver=False,
    use_espeak_fallback=True
)

# Alternative backend (requires pygoruut)
config = TokenizerConfig(
    backend="goruut"
)

tokenizer = Tokenizer(config=config)
```

**Note**: `use_dictionary` parameter is deprecated. Use `load_gold` and `load_silver`
instead for finer control.

**External G2P Libraries**: You can also use external phonemization libraries like
[Misaki](https://github.com/hexgrad/misaki):

```python
from misaki import en, espeak
import pykokoro

# Misaki G2P with espeak-ng fallback
fallback = espeak.EspeakFallback(british=False)
g2p = en.G2P(trf=False, british=False, fallback=fallback)
phonemes, _ = g2p("Hello, world!")

# Generate audio from phonemes
kokoro = pykokoro.Kokoro()
samples, sample_rate = kokoro.create(
    phonemes,
    voice="af_bella",
    is_phonemes=True
)
```

## License

This library is licensed under the Apache License 2.0.

## Credits

- **Kokoro Model**: [hexgrad/Kokoro-82M](https://huggingface.co/hexgrad/Kokoro-82M)
- **ONNX Models**:
  [onnx-community/Kokoro-82M-v1.0-ONNX](https://huggingface.co/onnx-community/Kokoro-82M-v1.0-ONNX)
- **Phonemizer**: [kokorog2p](https://github.com/remyxai/kokorog2p)

## Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

## Links

- **GitHub**: https://github.com/holgern/pykokoro
- **PyPI**: https://pypi.org/project/pykokoro/
- **Documentation**: https://pykokoro.readthedocs.io/
