Metadata-Version: 2.4
Name: voxscriber
Version: 0.2.4
Summary: Local speaker diarization using MLX Whisper (macOS) or faster-whisper (Linux/CUDA) and Pyannote
Project-URL: Repository, https://github.com/dparedesi/voxscriber
Author: Daniel Paredes
License-Expression: MIT
License-File: LICENSE
Keywords: apple-silicon,cuda,faster-whisper,mlx,pyannote,speaker-diarization,transcription,whisper
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Requires-Python: >=3.10
Requires-Dist: faster-whisper>=1.0.0; sys_platform != 'darwin' or platform_machine != 'arm64'
Requires-Dist: mlx-whisper>=0.4.0; sys_platform == 'darwin' and platform_machine == 'arm64'
Requires-Dist: pyannote-audio>=3.1.0
Requires-Dist: pydub>=0.25.1
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: soundfile>=0.12.1
Requires-Dist: tqdm>=4.66.0
Provides-Extra: cuda
Requires-Dist: faster-whisper>=1.0.0; extra == 'cuda'
Provides-Extra: dev
Requires-Dist: pytest-mock>=3.10.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: mlx
Requires-Dist: mlx-whisper>=0.4.0; extra == 'mlx'
Description-Content-Type: text/markdown

# VoxScriber

[![PyPI version](https://img.shields.io/pypi/v/voxscriber.svg)](https://pypi.org/project/voxscriber/)
[![Downloads](https://pepy.tech/badge/voxscriber)](https://pepy.tech/project/voxscriber)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10+-blue.svg)](https://www.python.org/downloads/)

Professional speaker diarization running 100% locally. Supports [MLX Whisper](https://github.com/ml-explore/mlx-examples/tree/main/whisper) on Apple Silicon and [faster-whisper](https://github.com/SYSTRAN/faster-whisper) on Linux/CUDA, combined with [Pyannote 3.1](https://github.com/pyannote/pyannote-audio).

![VoxScriber Banner](images/banner.png)

## Requirements

**macOS (Apple Silicon):**
- macOS with Apple Silicon (M1/M2/M3/M4)
- Python 3.10+
- FFmpeg 7 (`brew install ffmpeg@7 && brew link ffmpeg@7`)
- [Hugging Face token](https://huggingface.co/settings/tokens) (free, one-time model download)

**Linux:**
- Python 3.10+
- FFmpeg 4-7 (`sudo apt install ffmpeg`)
- [Hugging Face token](https://huggingface.co/settings/tokens) (free, one-time model download)
- For GPU: CUDA 12 + cuDNN 9 (optional, CPU works too)

## Installation

```bash
# macOS (Apple Silicon)
pip install voxscriber[mlx]

# Linux with CUDA
pip install voxscriber[cuda]

# Linux CPU-only
pip install voxscriber[cuda]  # faster-whisper works on CPU too

# Or with pipx (recommended for CLI tools)
pipx install "voxscriber[mlx]"   # macOS
pipx install "voxscriber[cuda]"  # Linux
```

### Setup Hugging Face Token

VoxScriber uses pyannote models which require a Hugging Face token.

**Option 1: Interactive setup (recommended)**

```bash
voxscriber-doctor
```

This will guide you through accepting the model terms and saving your token securely.

**Option 2: Using huggingface-cli**

```bash
# First, accept terms at https://huggingface.co/pyannote/speaker-diarization-3.1
huggingface-cli login
```

Your token will be saved to `~/.cache/huggingface/token` and used automatically.

**Option 3: Environment variable**

```bash
export HF_TOKEN=your_token_here
```

## Usage

```bash
# Basic
voxscriber meeting.m4a

# With known speaker count
voxscriber meeting.m4a --speakers 2

# All formats
voxscriber meeting.m4a --formats md,txt,json,srt,vtt

# Sentence-level subtitle segmentation for editing workflows
voxscriber meeting.m4a --formats srt,vtt --srt-mode sentence --srt-max-duration 15

# Print to console
voxscriber meeting.m4a --print
```

### Python API

```python
from voxscriber import DiarizationPipeline, PipelineConfig

config = PipelineConfig(
    num_speakers=2,
    language="en",
)
pipeline = DiarizationPipeline(config)
transcript = pipeline.process("meeting.m4a")

for segment in transcript.segments:
    print(f"{segment.speaker}: {segment.text}")
```

## Output Formats

| Format | Description |
|--------|-------------|
| `md` | Markdown with bold speaker names |
| `txt` | Timestamped plain text |
| `json` | Structured data with word-level timestamps |
| `srt` | SubRip subtitles |
| `vtt` | WebVTT subtitles |

## Options

```
voxscriber --help

  --speakers, -s    Number of speakers (if known)
  --language, -l    Force language (e.g., 'en', 'es')
  --model, -m       Whisper model (default: large-v3-turbo)
  --formats, -f     Output formats (default: md,txt)
  --output, -o      Output directory
  --device          auto (default), mps, cuda, or cpu
  --srt-mode        Subtitle segmentation mode for srt/vtt: speaker|sentence
  --srt-max-duration  Maximum subtitle duration in seconds for srt/vtt
  --quiet, -q       Suppress progress
  --print           Print transcript to console
```

## Performance

~0.1-0.15x RTF on Apple Silicon (MLX). ~0.15-0.25x RTF on NVIDIA GPUs (faster-whisper). A 20-minute recording processes in ~2-4 minutes depending on hardware.

## Troubleshooting

Run the diagnostic tool to check your setup:

```bash
voxscriber-doctor
```

This will check FFmpeg, torchcodec, and HF_TOKEN, and offer to fix common issues automatically.

### FFmpeg & torchcodec Issues

VoxScriber uses pyannote-audio which requires torchcodec, and torchcodec requires FFmpeg 4-7.

**"FFmpeg 8 detected" or "torchcodec fails"**

FFmpeg 8 is not yet supported. Install FFmpeg 7:

```bash
brew uninstall ffmpeg
brew install ffmpeg@7 && brew link ffmpeg@7
```

**"Library not loaded: @rpath/libavutil" or "no LC_RPATH's found"**

This happens because `ffmpeg@7` is "keg-only" - Homebrew doesn't symlink it automatically. Add to your `~/.zshrc`:

```bash
export DYLD_LIBRARY_PATH="/opt/homebrew/opt/ffmpeg@7/lib:$DYLD_LIBRARY_PATH"
```

Then restart your terminal or run `source ~/.zshrc`.

### Other Issues

| Issue | Solution |
|-------|----------|
| `requires Python >= 3.10` | Use Python 3.10+: `python3.10 -m venv .venv` |
| Installed wrong package | It's `voxscriber` (with 'r'), not `voxscribe` |
| `HF_TOKEN required` | Run `voxscriber-doctor` to set up authentication |

## Support

If you find VoxScriber useful, consider supporting its development:

[![Buy Me A Coffee](https://img.shields.io/badge/Buy%20Me%20A%20Coffee-donate-yellow.svg)](https://buymeacoffee.com/dparedesi)
[![GitHub Sponsors](https://img.shields.io/badge/GitHub%20Sponsors-sponsor-pink.svg)](https://github.com/sponsors/dparedesi)

## License

MIT
