Metadata-Version: 2.4
Name: trainscribe
Version: 0.1.2
Summary: A command-line tool for transcribing audio files in a folder to a metadata.csv file, using OpenAI's Whisper.
Keywords: transcribe,transcription,audio-transcription,openai-whisper,whisper,ljspeech,ljspeech-format,audio-formatter,text-to-speech,tts,tts-train,tts-finetune,cli,command-line
Author: VERA LVX
Author-email: VERA LVX <veralvx@veralvx.com>
License-Expression: MIT
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Multimedia :: Sound/Audio :: Speech
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing
Classifier: Topic :: Text Processing :: General
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Utilities
Classifier: Typing :: Typed
Requires-Dist: devicer>=0.1.1
Requires-Dist: openai-whisper>=20250625
Requires-Dist: torch>=2.5.1
Requires-Dist: intel-extension-for-pytorch>=2.8,<3.0 ; python_full_version >= '3.10' and python_full_version < '3.14' and extra == 'xpu'
Maintainer: VERA LVX
Maintainer-email: VERA LVX <veralvx@veralvx.com>
Requires-Python: >=3.10, <3.14
Project-URL: Documentation, https://github.com/veralvx/trainscribe
Project-URL: Homepage, https://github.com/veralvx/trainscribe
Project-URL: Issues, https://github.com/veralvx/trainscribe/issues
Project-URL: Repository, https://github.com/veralvx/trainscribe
Provides-Extra: xpu
Description-Content-Type: text/markdown

# Trainscribe

Trainscribe is a command-line tool that transcribes audio files in a specified folder using [OpenAI's Whisper](https://github.com/openai/whisper) and generates a `metadata.csv` file. The produced metadata file is intended to use in training/finetune of text to speech (TTS) models, and may use one of the following formats: 
- `file_id|transcribed_text`, or 
- `file_id|transcribed_text|speaker`, if a speaker label is provided. 

This is similar to LJ Speech format, but lacks an additional field with normalized transcribed text for pronuciation. Particularly, `file_id|transcribed_text` may be used in projects like [piper-train](https://github.com/veralvx/piper-train), and `file_id|transcribed_text|speaker` in [xtts-finetune](https://github.com/veralvx/xtts-finetune).

## Requirements

- Python >=3.10, <3.14
- [`uv`](https://docs.astral.sh/uv/)
- `ffmpeg` (install with `sudo apt install ffmpeg`)


## Usage

Run the tool with:

```console
uvx trainscribe --folder /path/to/audio/folder [options]
```

```console
Transcribe a folder of audio files to metadata.csv using Whisper.

options:
  -h, --help            show this help message and exit
  --folder, -f FOLDER   Folder with audio files
  --lang, -l LANG       Language code for transcription (e.g. 'en')
  --model, -m MODEL     Whisper model name (tiny, base, small, medium, large, turbo)
  --speaker, -s SPEAKER
                        Speaker label to add to metadata lines
  --device, -d DEVICE   Device for whisper model (cuda/cpu)
  --output, -o OUTPUT
```

### Example
Transcribe English audio in dataset/wavs using the medium model:

```console
uvx trainscribe --folder dataset/wavs --lang en --model medium 
```

This generates `dataset/wavs/metadata.csv` 
