Metadata-Version: 2.4
Name: whisper-tools
Version: 0.1.2
Summary: Real-time speech recognition with Whisper
Author: Kirill Yuzhakov
Author-email: Kirill Yuzhakov <luxlapari@gmail.com>
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0.0
Requires-Dist: transformers>=4.30.0
Requires-Dist: sounddevice>=0.4.6
Requires-Dist: soundfile>=0.12.1
Requires-Dist: openai>=1.0.0
Requires-Dist: numpy>=1.21.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Dynamic: author
Dynamic: requires-python

# Whisper-tools
High-level python library for stream and static transcription with whisper

## Getting started

### Installation 
```bash
pip install whisper-tools
```

### Usage
- For transcription using a local Whisper model (downloaded automatically), create a transcriber - an object of the `WhisperLocal` class:
```python
from whisper_tools import WhisperLocal

transcriber = WhisperLocal()

# WhisperLocal(model_name="openai/whisper-tiny", language='en', device='auto'))
```

- For transcription using an LLM via API, create a transcriber - an object of the `WhisperAPI` class:
```python
from whisper_tools import WhisperAPI

transcriber_api = WhisperAPI(api_key="your_key", base_url="your_url")

# WhisperAPI(api_key="your_key", base_url="your_url", model="your_model")
```

#### Transcribing an audio file

- For local transcription: `text = transcriber.transcribe_file('path-to-file.wav')`
- For transcription using an LLM via API: `text = transcriber.transcribe_file_api('path-to-file.wav')`

#### Real-time transcription

> [!IMPORTANT]  
> True streaming transcription requires modifications to the Whisper architecture, as the original model expects a complete audio file, so we send information in chunks.

- For local transcription: create an object of the `StreamRecorder` class and pass it our transcriber `recorder = StreamRecorder(WhisperLocal())`
- For transcription using an LLM via API: create an object of the `StreamRecorderAPI` class and pass it our transcriber `recorder = StreamRecorder(WhisperAPI(api_key="your_key", base_url="your_url"))`

```python
try:
    # start recording
    recorder.start_recording()
    print("Recording... Press Ctrl+C to stop")
    while True:
        # get a chunk (block of transcribed speech)
        text = recorder.process_chunk()
        if text:
            print(f"Recognized: {text}")
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    # stop recording
    recorder.stop_recording()
```

## Examples

Transcribing an audio file locally:
```python
from whisper_tools import WhisperLocal

text = WhisperLocal().transcribe_file("/voice example/example.wav")

print(text)
```

Transcribing an audio file via API::
```python
from whisper_tools import WhisperAPI

whisper_api = WhisperAPI(api_key="your_key", base_url="your_url")
text = whisper_api.transcribe_file_api("/voice example/example.wav")

print(text)
```

Streaming transcription locally:
```python
from whisper_tools import WhisperLocal, StreamRecorder

recorder = StreamRecorder(WhisperLocal())
# StreamRecorder(WhisperLocal(), sample_rate=16000, chunk_duration=5, min_interval=2.0)

try:
    recorder.start_recording()
    print("Recording... Press Ctrl+C to stop")
    while True:
        text = recorder.process_chunk()
        if text:
            print(f"Recognized: {text}")
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop_recording()
```

Streaming transcription via API:
```python
from whisper_tools import WhisperAPI, StreamRecorderAPI

recorder = StreamRecorderAPI(WhisperAPI(api_key="your_key", base_url="your_url"))
# StreamRecorderAPI(WhisperAPI(), sample_rate=16000, chunk_duration=5, min_interval=2.0)

try:
    recorder.start_recording()
    print("Recording... Press Ctrl+C to stop")
    while True:
        text = recorder.process_chunk()
        if text:
            print(f"Recognized: {text}")
except KeyboardInterrupt:
    print("\nStopping...")
finally:
    recorder.stop_recording()
```
