Metadata-Version: 2.4
Name: gensay
Version: 0.4.2
Summary: Multi-provider TTS tool compatible with macOS say command. Supports Chatterbox TTS for local; ElevenLabs, OpenAI, AWS Polly for cloud APIs
Keywords: 
Author: Anthony Wu
Author-email: Anthony Wu <pls-file-gh-issue@users.noreply.github.com>
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Requires-Dist: boto3>=1.40.0,<2.0
Requires-Dist: botocore[crt]>=1.40.0,<2.0
Requires-Dist: diskcache>=5.6.3,<6.0
Requires-Dist: distro>=1.9.0,<2.0
Requires-Dist: hf-transfer>=0.1.9,<1.0
Requires-Dist: openai>=1.98.0,<2.0
Requires-Dist: platformdirs>=4.3,<5.0
Requires-Dist: psutil>=7.0,<8.0
Requires-Dist: python-dotenv>=1.1.1,<2.0
Requires-Dist: tqdm>=4.67,<5.0
Requires-Dist: gensay[audio-formats,chatterbox,elevenlabs] ; extra == 'all'
Requires-Dist: pydub>=0.25.0 ; extra == 'audio-formats'
Requires-Dist: ffmpeg-python>=0.2.0 ; extra == 'audio-formats'
Requires-Dist: accelerate>=1.9.0 ; extra == 'chatterbox'
Requires-Dist: audioop-lts>=0.2.2,<1.0 ; python_full_version >= '3.13' and extra == 'chatterbox'
Requires-Dist: diffusers>=0.34.0 ; extra == 'chatterbox'
Requires-Dist: numba>=0.60.0 ; extra == 'chatterbox'
Requires-Dist: numpy>=1.26.0 ; extra == 'chatterbox'
Requires-Dist: peft>=0.16.0,<1.0 ; extra == 'chatterbox'
Requires-Dist: pydub>=0.25,<1.0 ; extra == 'chatterbox'
Requires-Dist: torch>=2.6,<3.0 ; extra == 'chatterbox'
Requires-Dist: torchaudio>=2.6,<3.0 ; extra == 'chatterbox'
Requires-Dist: elevenlabs[pyaudio]>=2.0,<3.0 ; extra == 'elevenlabs'
Requires-Python: >=3.11
Project-URL: Documentation, https://github.com/anthonywu/gensay#readme
Project-URL: Issues, https://github.com/anthonywu/gensay/issues
Project-URL: Source, https://github.com/anthonywu/gensay
Provides-Extra: all
Provides-Extra: audio-formats
Provides-Extra: chatterbox
Provides-Extra: elevenlabs
Description-Content-Type: text/markdown

# gensay

[![PyPI - Version](https://img.shields.io/pypi/v/gensay.svg)](https://pypi.org/project/gensay)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/gensay.svg)](https://pypi.org/project/gensay)

A multi-provider text-to-speech (TTS) tool that implements the Apple macOS `/usr/bin/say` command interface while supporting multiple TTS backends including Chatterbox (local AI), OpenAI, ElevenLabs, and Amazon Polly.

## Features

- **macOS `say` Compatible**: Drop-in replacement for the macOS `say` command with identical CLI interface
- **Multiple TTS Providers**: Extensible provider system with support for:
  - [macOS native `say` command](https://developer.apple.com/library/archive/documentation/LanguagesUtilities/Conceptual/MacAutomationScriptingGuide/SpeakText.html) (default on macOS)
  - [Chatterbox](https://github.com/resemble-ai/chatterbox) (local AI TTS, default on other platforms)
  - [ElevenLabs](https://elevenlabs.io/docs/api-reference/text-to-speech/convert) (cloud API)
  - [OpenAI TTS](https://platform.openai.com/docs/guides/text-to-speech) (cloud API)
  - [Amazon Polly](https://aws.amazon.com/polly/) (cloud API)
  - Mock provider for testing
- **Smart Text Chunking**: Intelligently splits long text for optimal TTS processing
- **Audio Caching**: Automatic caching with LRU eviction to speed up repeated synthesis
- **Progress Tracking**: Built-in progress bars with tqdm and customizable callbacks
- **Multiple Audio Formats**: Support for AIFF, WAV, M4A, MP3, CAF, FLAC, AAC, OGG
- **Background Pre-caching**: Queue and cache audio chunks in the background (Chatterbox only)
- **Interactive REPL Mode**: Start an interactive session with provider initialized once for repeated use
- **Named Pipe Listener**: Listen on a FIFO for text input from other processes

## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [Command Line Usage](#command-line-usage)
- [Python API](#python-api)
- [Provider Configurations](#provider-configurations)
- [Advanced Features](#advanced-features)
- [Development](#development)
- [License](#license)

## Installation

It's 2026, use [uv](https://github.com/astral-sh/uv)

`gensay` is intended to be used as a CLI tool that is a drop-in replacement to the macOS `say` CLI.

### System Dependencies (ElevenLabs provider only)

**PortAudio is required** if you plan to use the ElevenLabs provider. The `pyaudio` dependency needs the PortAudio C library to compile successfully.

Other providers (macOS, OpenAI, Amazon Polly, Chatterbox) do not require PortAudio.

**Homebrew (macOS):**

```bash
brew install portaudio
```

**Nix:**

```bash
nix-env -iA nixpkgs.portaudio
```

### Install gensay

```sh
# Install as a tool
uv tool install gensay

# With extras: ElevenLabs provider (requires PortAudio, see above)
pip install 'gensay[elevenlabs]'

# With extras: Chatterbox provider (local Text-to-Speech model, ~2GB PyTorch dependencies)
uv tool install 'gensay[chatterbox]' \
  --with git+https://github.com/anthonywu/chatterbox.git@allow-dep-updates

# Or add to your project
uv add gensay

# From source (with automatic PortAudio path configuration)
git clone https://github.com/anthonywu/gensay
cd gensay
just setup
```

### Optional Dependencies

```sh
# Audio format conversion (for non-native formats like MP3, OGG, FLAC)
# Requires ffmpeg installed on system
pip install 'gensay[audio-formats]'

# Install all optional dependencies
pip install 'gensay[all]'
```

**DInstallation Help:**

- [PyAudio documentation](https://pypi.org/project/PyAudio/) - For PortAudio/PyAudio installation issues
- [ElevenLabs Python library docs](https://elevenlabs.io/docs/agents-platform/libraries/python) - Official ElevenLabs Python documentation

For developer/maintainer installation, `just setup` automatically configures PortAudio and FFmpeg paths for both Nix and Homebrew.

### Developer/Maintainer Build Dependencies

#### PortAudio Paths (for ElevenLabs)

**Homebrew:**

```bash
export C_INCLUDE_PATH="$(brew --prefix portaudio)/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="$(brew --prefix portaudio)/lib:$LIBRARY_PATH"
```

**Nix:**

```bash
export C_INCLUDE_PATH="$(nix-build '<nixpkgs>' -A portaudio --no-out-link)/include:$C_INCLUDE_PATH"
export LIBRARY_PATH="$(nix-build '<nixpkgs>' -A portaudio --no-out-link)/lib:$LIBRARY_PATH"
```

Then install into local venv:

```sh
uv sync --all-extras
# temporarily, we have to use a special release of chatterbox library to allow for dependency resolution
uv pip install git+https://github.com/anthonywu/chatterbox.git@allow-dep-updates
```

#### FFmpeg Library Path (for Chatterbox on macOS)

Chatterbox uses TorchCodec which requires FFmpeg libraries at runtime. On macOS, set `DYLD_LIBRARY_PATH` before running gensay:

**Homebrew:**

```bash
export DYLD_LIBRARY_PATH="$(brew --prefix ffmpeg)/lib:$DYLD_LIBRARY_PATH"
gensay --provider chatterbox "Hello"
```

**Nix:**

```bash
# Find the ffmpeg-lib output in the Nix store
FFMPEG_LIB=$(nix-store -qR "$(which ffmpeg)" | grep 'ffmpeg.*-lib$')
export DYLD_LIBRARY_PATH="$FFMPEG_LIB/lib:$DYLD_LIBRARY_PATH"
gensay --provider chatterbox "Hello"
```

Note: `DYLD_LIBRARY_PATH` must be set before the Python process starts; it cannot be set from within Python.

## Quick Start

```bash
# Basic usage - speaks the text
gensay "Hello, world!"

# Use specific voice
gensay -v Samantha "Hello from Samantha"

# Save to audio file
gensay -o greeting.m4a "Welcome to gensay"

# List available voices (two ways)
gensay -v '?'
gensay --list-voices
```

## Command Line Usage

### Basic Options

```bash
# Speak text
gensay "Hello, world!"

# Read from file
gensay -f document.txt

# Read from stdin
echo "Hello from pipe" | gensay -f -

# Specify voice
gensay -v Alex "Hello from Alex"

# Adjust speech rate (words per minute)
gensay -r 200 "Speaking faster"

# Save to file
gensay -o output.m4a "Save this speech"

# Specify audio format
gensay -o output.wav --format wav "Different format"
```

### Provider Selection

```bash
# Use macOS native say command
gensay --provider macos "Using system TTS"

# List voices for specific provider
gensay --provider macos --list-voices
gensay --provider mock --list-voices

# Use mock provider for testing
gensay --provider mock "Testing without real TTS"

# Use Chatterbox explicitly
gensay --provider chatterbox "Local AI voice"

# Default provider depends on platform
gensay "Hello"  # Uses 'macos' on macOS, 'chatterbox' on other platforms
```

### Advanced Options

```bash
# Show progress bar
gensay --progress "Long text with progress tracking"

# Pre-cache audio chunks in background
gensay --provider chatterbox --cache-ahead "Pre-process this text"

# Adjust chunk size
gensay --chunk-size 1000 "Process in larger chunks"

# Cache management
gensay --cache-stats     # Show cache statistics
gensay --clear-cache     # Clear all cached audio
gensay --no-cache "Text" # Disable cache for this run
```

### Interactive Modes and Performance Optimization

#### REPL Mode

Start an interactive session where the provider is initialized once and reused for each prompt. This avoids the overhead of re-initializing the provider.

> **Tip:** For Chatterbox and other local AI models, model loading from disk to memory is expensive (several seconds).
> Use `--repl` or `--listen` mode to load the model once and process many prompts without reloading.

```bash
# Start REPL mode (--repl, --interactive, and -i are all equivalent)
gensay --repl
gensay --interactive
gensay -i

# With a specific provider and voice
gensay --provider openai -v nova --repl

# Chatterbox with REPL (recommended - keeps model loaded)
gensay -p chatterbox -i
```

In REPL mode:

- Type text and press Enter to speak it
- Type `exit` or `quit` to exit
- Press Ctrl+C or Ctrl+D to exit

#### Named Pipe (FIFO) Listener

Listen on a named pipe for text input, allowing other processes to send text to be spoken. Useful for integrating TTS into scripts or other applications.

> **Tip:** Like REPL mode, `--listen` keeps the provider loaded between requests—ideal for Chatterbox and other local models where initialization is slow.

```bash
# Start listening on default pipe (/tmp/gensay.pipe)
gensay --listen

# Use a custom pipe path
gensay --listen /tmp/my-tts.pipe

# With a specific provider (Chatterbox benefits most from persistent mode)
gensay --provider chatterbox --listen
gensay --provider polly -v Joanna --listen
```

From another terminal or script, send text to the pipe:

```bash
echo "Hello from another process" > /tmp/gensay.pipe
```

The listener runs until interrupted with Ctrl+C. The named pipe is created automatically if it doesn't exist.

## Python API

### Basic Usage

```python
from gensay import ChatterboxProvider, TTSConfig, AudioFormat

# Create provider
provider = ChatterboxProvider()

# Speak text
provider.speak("Hello from Python")

# Save to file
provider.save_to_file("Save this", "output.m4a")

# List voices
voices = provider.list_voices()
for voice in voices:
    print(f"{voice['id']}: {voice['name']}")
```

### Advanced Configuration

```python
from gensay import ChatterboxProvider, TTSConfig, AudioFormat

# Configure TTS
config = TTSConfig(
    voice="default",
    rate=150,
    format=AudioFormat.M4A,
    cache_enabled=True,
    extra={
        'show_progress': True,
        'chunk_size': 500
    }
)

# Create provider with config
provider = ChatterboxProvider(config)

# Add progress callback
def on_progress(progress: float, message: str):
    print(f"Progress: {progress:.0%} - {message}")

config.progress_callback = on_progress

# Use the configured provider
provider.speak("Text with all options configured")
```

### Text Chunking

```python
from gensay import chunk_text_for_tts, TextChunker

# Simple chunking
chunks = chunk_text_for_tts(long_text, max_chunk_size=500)

# Advanced chunking with custom strategy
chunker = TextChunker(
    max_chunk_size=1000,
    strategy="paragraph",  # or "sentence", "word", "character"
    overlap_size=50
)
chunks = chunker.chunk_text(document)
```

## Provider Configurations

### ElevenLabs

1. Install the optional dependency (requires PortAudio):
   ```bash
   pip install 'gensay[elevenlabs]'
   ```
2. Get an API key from [ElevenLabs](https://elevenlabs.io)
3. Set the environment variable:
   ```bash
   export ELEVENLABS_API_KEY="your-api-key"
   ```

```bash
# List ElevenLabs voices
gensay --provider elevenlabs --list-voices

# Use a specific ElevenLabs voice
gensay --provider elevenlabs -v Rachel "Hello from ElevenLabs"

# Save to file with high quality
gensay --provider elevenlabs -o speech.mp3 "High quality AI speech"
```

### OpenAI TTS

1. Get an API key from [OpenAI Platform](https://platform.openai.com/api-keys)
2. Set the environment variable:
   ```bash
   export OPENAI_API_KEY="sk-..."
   ```

```bash
# List OpenAI voices
gensay --provider openai --list-voices

# Use a specific voice (alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer)
gensay --provider openai -v nova "Hello from OpenAI"

# Save to file
gensay --provider openai -o speech.mp3 "OpenAI TTS output"
```

OpenAI offers two models via `config.extra['model']`:

- `tts-1` (default): Faster, lower latency
- `tts-1-hd`: Higher quality audio

### Amazon Polly

**Option A - Environment variables:**

1. Sign in to [AWS Console](https://console.aws.amazon.com/)
2. Go to **IAM** → **Users** → **Create user**
3. Attach the `AmazonPollyReadOnlyAccess` policy
4. Create access keys under **Security credentials** → **Access keys**
5. Configure credentials (choose one method):

```bash
export AWS_ACCESS_KEY_ID="AKIA..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_DEFAULT_REGION="us-west-2"
```

**Option B - AWS CLI v2:**

This easy lets you [sign in through the AWS Command Line Interface](https://docs.aws.amazon.com/signin/latest/userguide/command-line-sign-in.html)

```bash
export AWS_DEFAULT_REGION=us-west-2
# on your desktop with a browser
aws login --region
# in an env without a browser
aws login --region --remote
```

```bash
# List Polly voices (60+ voices in many languages)
gensay --provider polly --list-voices

# Use a specific voice
gensay --provider polly -v Joanna "Hello from Amazon Polly"

# Save to file
gensay --provider polly -o speech.mp3 "Polly TTS output"
```

Polly supports multiple engines via `config.extra['engine']`:

- `neural` (default): Higher quality, natural-sounding
- `standard`: Lower cost, available for all voices

## Advanced Features

### Caching System

The caching system automatically stores generated audio to speed up repeated synthesis:

```python
from gensay import TTSCache

# Create cache instance
cache = TTSCache(
    enabled=True,
    max_size_mb=10000,
    max_items=1000
)

# Get cache statistics
stats = cache.get_stats()
print(f"Cache size: {stats['size_mb']:.2f} MB")
print(f"Cached items: {stats['items']}")

# Clear cache
cache.clear()
```

**Cache Location**

Cache files are stored in platform-specific user cache directories:

- **macOS**: `~/Library/Caches/gensay`
- **Linux**: `~/.cache/gensay`
- **Windows**: `%LOCALAPPDATA%\gensay\gensay\Cache`

**Managing Cache**

```bash
# Show cache statistics
gensay --cache-stats

# Clear all cached audio
gensay --clear-cache

# Disable caching for a specific command
gensay --no-cache "Text to synthesize without caching"
```

**Manual Deletion**

To manually delete the cache, remove the cache directory:

```bash
# macOS/Linux
rm -rf ~/Library/Caches/gensay  # macOS
rm -rf ~/.cache/gensay          # Linux

# Windows (PowerShell)
Remove-Item -Recurse -Force $env:LOCALAPPDATA\gensay\gensay\Cache
```

### Creating Custom Providers

```python
from gensay.providers import TTSProvider, TTSConfig, AudioFormat
from typing import Optional, Union, Any
from pathlib import Path

class MyCustomProvider(TTSProvider):
    def speak(self, text: str, voice: Optional[str] = None,
              rate: Optional[int] = None) -> None:
        # Your implementation
        self.update_progress(0.5, "Halfway done")
        # ... generate and play audio ...
        self.update_progress(1.0, "Complete")

    def save_to_file(self, text: str, output_path: Union[str, Path],
                     voice: Optional[str] = None, rate: Optional[int] = None,
                     format: Optional[AudioFormat] = None) -> Path:
        # Your implementation
        return Path(output_path)

    def list_voices(self) -> list[dict[str, Any]]:
        return [
            {'id': 'voice1', 'name': 'Voice One', 'language': 'en-US'}
        ]

    def get_supported_formats(self) -> list[AudioFormat]:
        return [AudioFormat.WAV, AudioFormat.MP3]
```

### Async Support

All providers support async operations:

```python
import asyncio
from gensay import ChatterboxProvider

async def main():
    provider = ChatterboxProvider()

    # Async speak
    await provider.speak_async("Async speech")

    # Async save
    await provider.save_to_file_async("Async save", "output.m4a")

asyncio.run(main())
```

## Development

This project uses [just](https://just.systems) for common development tasks. First, install just:

```bash
# macOS (using Nix which you already have)
nix-env -iA nixpkgs.just

# Or using Homebrew
brew install just

# Or using cargo
cargo install just
```

### Getting Started

```bash
# Setup development environment
just setup

# Run tests
just test

# Run all quality checks
just check

# See all available commands
just
```

### Common Development Commands

#### Testing

```bash
# Run all tests
just test

# Run tests with coverage
just test-cov

# Run specific test
just test-specific tests/test_providers.py::test_mock_provider_speak

# Quick test (mock provider only)
just quick-test
```

#### Code Quality

```bash
# Run linter
just lint

# Auto-fix linting issues
just lint-fix

# Format code
just format

# Type checking
just typecheck

# Run all checks (lint, format, typecheck)
just check

# Pre-commit checks (format, lint, test)
just pre-commit
```

#### Running the CLI

```bash
# Run with mock provider
just run-mock "Hello, world!"
just run-mock -v '?'

# Run with macOS provider
just run-macos "Hello from macOS"

# Cache management
just cache-stats
just cache-clear
```

#### Development Utilities

```bash
# Run example script
just demo

# Clean build artifacts
just clean

# Build package
just build
```

### Manual Setup (without just)

If you prefer not to use just, here are the equivalent commands:

```bash
# Setup
uv venv
uv pip install -e ".[dev]"

# Testing
uv run pytest -v
uv run pytest --cov=gensay --cov-report=term-missing

# Linting and formatting
uv run ruff check src tests
uv run ruff format src tests

# Type checking
uvx ty check src
```

### Project Structure

```
gensay/
├── src/gensay/
│   ├── __init__.py
│   ├── main.py              # CLI entry point
│   ├── providers/           # TTS provider implementations
│   │   ├── base.py         # Abstract base provider
│   │   ├── chatterbox.py   # Chatterbox provider
│   │   ├── macos_say.py    # macOS say wrapper
│   │   └── ...            # Other providers
│   ├── cache.py            # Caching system
│   └── text_chunker.py     # Text chunking logic
├── tests/                  # Test suite
├── examples/               # Example scripts
├── justfile                # Development commands
└── README.md
```

### Code Style Guide

- Python 3.11+ with type hints
- Follow PEP8 and Google Python Style Guide
- Use `ruff` for linting and formatting
- Keep docstrings concise but informative
- Prefer `pathlib.Path` over `os.path`
- Use `pytest` for testing

## License

`gensay` is distributed under the terms of the [MIT](https://spdx.org/licenses/MIT.html) license.
