Metadata-Version: 2.4
Name: surivoice
Version: 0.1.0
Summary: Local-first CLI for speech transcription with speaker diarization
Author: Surivoice Contributors
License: MIT
License-File: LICENSE
Keywords: cli,diarization,speech-to-text,transcription,whisper
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: pydantic>=2.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: ml
Requires-Dist: faster-whisper>=1.0; extra == 'ml'
Requires-Dist: pyannote-audio>=3.1; extra == 'ml'
Description-Content-Type: text/markdown

# Surivoice

**Local-first speech transcription with speaker diarization.**

Surivoice is an open-source CLI tool that takes a video or audio file, transcribes speech to text using [Whisper](https://github.com/openai/whisper), identifies speakers using [pyannote.audio](https://github.com/pyannote/pyannote-audio), and outputs a structured Markdown transcript — all running locally on your machine.

## Features

- 🎙️ Speech-to-text transcription (via [faster-whisper](https://github.com/SYSTRAN/faster-whisper))
- 👥 Speaker diarization (via [pyannote.audio](https://github.com/pyannote/pyannote-audio))
- 📝 Structured Markdown output with timestamps
- 🔒 Fully local — no cloud APIs, your data stays on your machine
- ⚡ GPU-accelerated with CPU fallback

## Requirements

- Python 3.10+
- [FFmpeg](https://ffmpeg.org/) installed and on `PATH`
- A [Hugging Face](https://huggingface.co/) access token (for pyannote.audio models)

## Installation

```bash
pip install surivoice
```

For GPU support, ensure you have CUDA-enabled PyTorch installed.

### Development

```bash
git clone https://github.com/your-org/surivoice.git
cd surivoice
pip install -e ".[dev,ml]"
```

## Usage

### Authentication

To use pyannote.audio for speaker diarization, you need to provide a Hugging Face access token:

```bash
surivoice save-token YOUR_HF_TOKEN
```
Or you can use the `--hf-token` flag when transcribing, or export the `HF_TOKEN` environment variable.

### Transcription

```bash
surivoice transcribe -i meeting.mp4 -o transcript.md
```

### Options

| Flag | Description | Default |
|------|-------------|---------|
| `-i, --input` | Input video/audio file | *required* |
| `-o, --output` | Output Markdown file | *required* |
| `-m, --model` | Whisper model size | `medium` |
| `-d, --device` | Compute device (`auto`, `cpu`, `cuda`) | `auto` |
| `--compute-type` | Quantization algorithm | `int8` |
| `-l, --language` | Language code (auto-detect if omitted) | `None` |
| `--hf-token` | Hugging Face access token (or set `HF_TOKEN` env var) | `None` |
| `-s, --speakers` | Exact speaker count hint | `None` |
| `-v, --version` | Show version and exit | 

## Supported Formats

**Video:** `.mp4`, `.mkv`, `.avi`, `.mov`, `.webm`
**Audio:** `.mp3`, `.wav`, `.flac`, `.ogg`, `.m4a`, `.aac`, `.wma`

## License

[MIT](LICENSE)
