Metadata-Version: 2.4
Name: transcribe-all
Version: 0.3.0
Summary: CLI audio transcription via Groq Whisper with optional speaker diarization
License-Expression: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31
Requires-Dist: rich>=13.7
Provides-Extra: diarize
Requires-Dist: pyannote.audio>=3.1; extra == "diarize"
Requires-Dist: torch>=2.2; extra == "diarize"
Requires-Dist: torchaudio>=2.2; extra == "diarize"
Provides-Extra: dev
Requires-Dist: pytest>=7.4; extra == "dev"
Dynamic: license-file

# transcribe-all

<p align="center">
  <img src="assets/hero-banner.svg" alt="transcribe-all banner" width="100%" />
</p>

<p align="center">
  <img alt="Python" src="https://img.shields.io/badge/Python-3.9+-1e3a8a?style=for-the-badge&logo=python&logoColor=ffd43b">
  <img alt="Groq" src="https://img.shields.io/badge/Groq-Whisper%20v3-111827?style=for-the-badge">
  <img alt="pyannote" src="https://img.shields.io/badge/pyannote-Diarization-0f766e?style=for-the-badge">
  <img alt="License" src="https://img.shields.io/badge/License-MIT-166534?style=for-the-badge">
</p>

Cloud-first transcription CLI with optional speaker diarization.
Built for fast hackathon delivery: simple install, practical output, and clear timestamps.

## Why this project

- Fast transcription via Groq Whisper models
- Speaker segmentation via pyannote (optional)
- Clean sentence blocks with timestamp formatting
- Handles large files by splitting and merging automatically
- Works from terminal with one command

<p align="center">
  <img src="assets/pipeline-diagram.svg" alt="Pipeline diagram" width="100%" />
</p>

## Quick start

```bash
git clone https://github.com/syrex1013/Transcribe.git
cd Transcribe
chmod +x install.sh transcribe
./install.sh
```

Transcribe:

```bash
transcribe recording.mp3 en
```

## Package manager installs

### Homebrew

```bash
brew tap syrex1013/transcribe-all https://github.com/syrex1013/Transcribe
brew install transcribe-all
```

### APT (`apt-get`)

```bash
echo "deb [trusted=yes] https://hacklabjournal.me/Transcribe/apt ./" | sudo tee /etc/apt/sources.list.d/transcribe-all.list
sudo apt-get update
sudo apt-get install -y transcribe-all
```

### pip

```bash
pip install transcribe-all
```

Optional diarization extras:

```bash
pip install "transcribe-all[diarize]"
```

Full install details and troubleshooting: [INSTALLATION.md](INSTALLATION.md)

## Usage

```bash
# basic
transcribe input.mp3 en

# expected speaker count
transcribe interview.mp3 en --speakers 2

# disable diarization
transcribe lecture.mp3 en --no-diarize

# local whisper.cpp mode
transcribe input.mp3 en --local
```

## Configuration

The tool reads tokens from environment variables and from:

```text
~/.config/transcribe/config
```

Required:

- `GROQ_API_KEY`

Optional:

- `HF_TOKEN` for pyannote speaker diarization
- `WHISPER_MODEL_PATH` for `--local` mode (path to `ggml-large-v3.bin`)

Use `.env.example` as reference.

## Example output

```text
-- Speaker 1 ----------------------------------------
[00:00]  Welcome to the demo recording.
[00:04]  Today we will test HTTP interception in Burp.

-- Speaker 2 ----------------------------------------
[01:32]  Open the Proxy tab and enable intercept.
[01:38]  Now inspect headers and session cookies.
```

## Project layout

```text
.
|- transcribe              # CLI entrypoint
|- transcribe_groq.py      # Core transcription + diarization pipeline
|- install.sh              # Installer for dependencies and shell setup
|- INSTALLATION.md         # Detailed install and PATH guide
|- .env.example            # Environment variable template
|- CHANGELOG.md
|- RELEASE_CHECKLIST.md
`- assets/
   |- hero-banner.svg
   `- pipeline-diagram.svg
```

## Release notes

Initial release artifacts are prepared:

- `.gitignore` for Python, secrets, generated transcripts, and media
- `LICENSE` (MIT)
- `CHANGELOG.md`
- `RELEASE_CHECKLIST.md`

## Installation guide

The installer is designed to:

- Install required dependencies: `ffmpeg`, `ffprobe`, Python packages from `requirements.txt`
- Install `transcribe` globally in `/usr/local/bin` when possible
- Fallback to user-local install in `~/.local/bin` when system install is unavailable
- Update shell profile so `transcribe` is available everywhere
- Persist token config in `~/.config/transcribe/config`

## License

MIT. See [LICENSE](LICENSE).
