Metadata-Version: 2.4
Name: voxium
Version: 0.2.0
Summary: A client library for the Voxium real-time transcription service
Home-page: https://github.com/nathanmfrench/voxium-client
Author: Nathan French
Author-email: nathanmfrench17@gmail.com
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Operating System :: OS Independent
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: websockets>=10.0
Requires-Dist: numpy
Requires-Dist: sounddevice
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# Voxium Real-Time Transcription Client

This project provides a Python client library for interacting with the Voxium real-time speech-to-text (ASR) WebSocket service. It captures audio from the microphone, streams it to the Voxium server, and processes the received transcriptions via callbacks.

**Key Features:**

* **Real-time Audio Streaming:** Captures microphone audio using `sounddevice`.
* **WebSocket Communication:** Connects to and communicates with the Voxium ASR WebSocket endpoint using `websockets`.
* **Asynchronous Operation:** Built using `asyncio` for efficient non-blocking I/O.
* **Thread-Safe Audio Handling:** Safely transfers audio data from the `sounddevice` callback thread to the main `asyncio` event loop using `asyncio.Queue`.
* **Configurable Parameters:** Allows setting language, VAD thresholds, API keys, and other parameters for the Voxium service.
* **Callback-Based API:** Provides asynchronous callbacks for handling transcription results, errors, connection events (open/close).
* **Simplified Usage:** Offers a high-level `LiveTranscriber` class with a blocking `start_transcription` method for easy integration.

## Table of Contents

* [Prerequisites](#prerequisites)
* [Installation](#installation)
* [Configuration](#configuration)
* [Usage](#usage)
* [Core Components](#core-components)
    * [VoxiumClient (`client.py`)](#voxiumclient-clientpy)
    * [LiveTranscriber (`live_transcribe.py`)](#livetranscriber-live_transcribepy)
* [Callbacks](#callbacks)
* [Logging](#logging)
* [How it Works](#how-it-works)
* [Dependencies](#dependencies)
* [Troubleshooting](#troubleshooting)

## Prerequisites

* **Python:** Version 3.7+ (due to `asyncio` usage)
* **Microphone:** A working microphone connected to your system and recognized by `sounddevice`.
* **PortAudio:** The `sounddevice` library depends on the PortAudio library. Installation varies by OS:
    * **macOS:** `brew install portaudio`
    * **Debian/Ubuntu:** `sudo apt-get install libportaudio2 libportaudiocpp0 portaudio19-dev`
    * **Windows:** Often included with Python distributions or audio drivers. Check the [sounddevice documentation](https://python-sounddevice.readthedocs.io/en/latest/installation.html) for details.
* **Voxium API Key:** You need an API key from Voxium to authenticate with the service.

## Installation

1.  **From Source** 

```bash
git clone https://github.com/nathanmfrench/voxium-client
```

```bash
pip install -r requirements.txt
```
2.  **From Package Manager:**

```bash
pip install voxium
```
(or you can use your package manager of choice)


## Configuration

Configuration is primarily done within your Python script (like `example_usage.py`):

* **`VOXIUM_API_KEY`**: **Required.** Replace the placeholder `"YOUR_API_KEY_HERE"` with your actual Voxium API key. This is sent as a query parameter (`apiKey`) for authentication.
* **`VOXIUM_SERVER_URL`**: The WebSocket endpoint URL for the Voxium ASR service. Defaults to `"wss://voxium.tech/asr/ws"`.
* **`VOXIUM_LANGUAGE`**: The language code for transcription (e.g., `"en"`, `"es"`, `"fr"`). Defaults to `"en"`.
* **Other Parameters:** You can customize other parameters when initializing `LiveTranscriber` or `VoxiumClient`:
    * `vad_threshold` (float): Voice Activity Detection threshold (client-side hint for server).
    * `silence_threshold` (float): Server-side silence duration parameter, controls length of silence before sending an audio chunk.
    * `sample_rate` (int): Audio sample rate (hardcoded to 16000 Hz in `live_transcribe.py`).
    * `input_format` (str): Expected audio format on the server *after* base64 decoding (hardcoded to `"base64"` in `live_transcribe.py` as the client sends base64).
    * `beam_size` (int): Beam size controls number of search candidates at each step (set to 1 for greedy decoding).
    * `language` (str): 2 digit ISO-code for the desired language. 'None' to auto-detect (language and probability passed to client as language, language_probability respectively)
    
50 Supported languages: Arabic (ar), Armenian (hy), Azerbaijani (az), Belarusian (be), Bosnian (bs), Bulgarian (bg), Catalan (ca), Chinese (zh), Croatian (hr), Czech (cs), Danish (da), Dutch (nl), English (en), Estonian (et), Finnish (fi), French (fr), German (de), Greek (el), Hebrew (he), Hindi (hi), Hungarian (hu), Icelandic (is), Indonesian (id), Italian (it), Japanese (ja), Kannada (kn), Korean (ko), Latvian (lv), Lithuanian (lt), Macedonian (mk), Malay (ms), Nepali (ne), Norwegian (no), Persian (fa), Polish (pl), Portuguese (pt), Romanian (ro), Russian (ru), Serbian (sr), Slovak (sk), Slovenian (sl), Spanish (es), Swedish (sv), Tagalog (tl), Tamil (ta), Thai (th), Turkish (tr), Ukrainian (uk), Urdu (ur), Vietnamese (vi)

## Usage

The `example_usage.py` script demonstrates how to use the `LiveTranscriber`.

1.  **Import:** Import the necessary classes and modules.
2.  **Configure Logging:** Set up Python's `logging` module (the example provides a basic console logger).
3.  **Define Transcription Handler:** Create an `async` function that will receive transcription results (dictionaries). This is where you integrate the text into your application logic.
4.  **Set Parameters:** Define your API key, server URL, and language.
5.  **Initialize `LiveTranscriber`:** Create an instance, passing the configuration parameters.
6.  **Start Transcription:** Call the `start_transcription` method, providing your handler function. This method is **blocking** and will run until interrupted (e.g., Ctrl+C) or a critical error occurs.

```python
import logging
from voxium_client import LiveTranscriber 

# --- 1. Configure Logging  ---
# This basic setup logs WARNING level messages and above to the console.
# It's sufficient for most examples (you can implement more complex logging if needed).
# Set level=logging.DEBUG for debugging logs, and level=logging.INFO for normal logs.

logging.basicConfig(
    level=logging.WARNING,
    format='%(asctime)s [%(levelname)s] %(name)s: %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger("MyVoiceAgentExample")


# --- 2. Define Your Transcription Handler (Required) ---
# This asynchronous function will receive transcription results.
async def handle_transcription(result: dict):
    """
    Callback function to process transcription results from Voxium.
    'result' is a dictionary.
    """
    try:
        text = result.get('transcription', '')
        print("Transcription: ", text)

        # Now, you can ntegrate the transcription text into your voice agent's pipeline. For example:
            # await send_to_my_agent_logic(text)

    except Exception as e:
        logger.error(f"Error within handle_transcription callback: {e}", exc_info=True)


# --- 3. Main Execution ---
if __name__ == "__main__":
    logger.info("Starting Voxium LiveTranscriber Integration Example...")

    # --- Configuration Parameters ---
    VOXIUM_API_KEY = "YOUR_API_KEY_HERE"

    if VOXIUM_API_KEY == "YOUR_API_KEY_HERE":
        logger.warning("Using placeholder API Key. Please set the VOXIUM_API_KEY environment variable or replace the placeholder.")

    VOXIUM_SERVER_URL = "wss://voxium.tech/asr/ws"
    VOXIUM_LANGUAGE = "en"

    # --- Initialize the Transcriber ---
    logger.info(f"Initializing LiveTranscriber for {VOXIUM_LANGUAGE}...")
    transcriber = LiveTranscriber(
        server_url=VOXIUM_SERVER_URL,
        language=VOXIUM_LANGUAGE,
        api_key=VOXIUM_API_KEY
        # Add/override other parameters if needed:
        # vad_threshold=0.5,
        # silence_threshold=0.5,
    )

    # --- Start the Transcription Process ---
    # This is a blocking call that runs the transcriber until stopped (Ctrl+C or error).
    # It requires your 'on_transcription' callback function.
    logger.info("Starting transcription. Press Ctrl+C to stop.")
    try:
        transcriber.start_transcription(
            on_transcription=handle_transcription
            # --- Optional Callbacks ---
            # You can provide your own async functions for other events:
            # on_error=my_async_error_handler,
            # on_open=my_async_open_handler,
            # on_close=my_async_close_handler
        )
    except Exception as e:
         logger.critical(f"Failed to run transcription: {e}", exc_info=True)

    logger.info("Transcription process finished.")
```
Run script from your terminal with

```
python3 example_usage.py
```

## Core Components

### VoxiumClient (`client.py`)

* Manages the low-level WebSocket connection lifecycle (`connect`, `close`).
* Handles URL construction with query parameters (including the API key).
* Formats outgoing audio messages (encodes audio bytes to Base64 JSON).
* Runs a persistent `_receiver` task to listen for incoming messages (transcriptions, status, errors) from the server.
* Parses incoming JSON messages and routes them to the appropriate asynchronous callbacks.
* Provides setter methods (`set_transcription_callback`, `set_error_callback`, etc.) for registering custom handlers.
* Implements `async` context manager protocols (`__aenter__`, `__aexit__`) for automatic connection setup and teardown.
* Handles WebSocket connection errors and state changes.

### LiveTranscriber (`live_transcribe.py`)

* Acts as the primary interface for users.
* Initializes and manages the `VoxiumClient`.
* Uses `sounddevice.InputStream` to capture audio from the default microphone in a separate thread.
* The `_audio_callback` function (running in the `sounddevice` thread) converts audio chunks (`numpy.ndarray`) to bytes.
* Uses `loop.call_soon_threadsafe` to safely put the audio bytes onto an `asyncio.Queue` from the `sounddevice` thread.
* An `_audio_loop` `async` task runs in the main event loop, consuming audio bytes from the queue.
* Sends audio chunks to the server via the `VoxiumClient.send_audio_chunk` method.
* Manages the starting (`start`) and stopping (`stop`, `cleanup_audio`) of the audio stream and processing loop.
* Provides the simplified, blocking `start_transcription` method which:
    * Sets up default callbacks if user doesn't provide them.
    * Assigns user-provided callbacks to the underlying `VoxiumClient`.
    * Checks basic `sounddevice` settings before starting.
    * Uses `asyncio.run()` to manage the event loop and run the main `start` coroutine.
    * Handles `KeyboardInterrupt` for graceful shutdown.

## Callbacks

The `LiveTranscriber.start_transcription` method accepts several optional `async` callback functions:

* **`on_transcription(result: dict)`**: *(Required)* Called whenever a transcription message (partial or final) is received from the server. The `result` dictionary typically contains keys like `"transcription"` (the text) and `"is_final"` (boolean).
* **`on_error(error: Union[Exception, str])`**: Called when the server sends an error status message or when certain client-side processing errors occur (like in the audio loop or message handling).
* **`on_open(info: dict)`**: Called once after the WebSocket connection is successfully established and the initial server handshake is complete. The `info` dictionary contains details sent by the server (e.g., model info).
* **`on_close(code: int, reason: str)`**: Called when the WebSocket connection is closed, either cleanly or due to an error detected by the underlying `websockets` library *within the receiver task*. Provides the close code and reason.

If you don't provide `on_error`, `on_open`, or `on_close`, default handlers that log the event will be used.

*Note:* The underlying `VoxiumClient` also has a `connection_error_callback` specifically for errors during the connection phase or WebSocket-level close errors not caught by the receiver loop's `ConnectionClosed...` exceptions. This isn't directly exposed via `start_transcription` but is used internally and logs errors.

## Logging

The code uses Python's standard `logging` module.

* `example_usage.py` sets up a basic configuration that logs `INFO` level messages and above to the console.
* `client.py` and `live_transcribe.py` obtain their own loggers (`logging.getLogger(__name__)`).
* You can customize the logging level and format in `example_usage.py` (e.g., set `level=logging.DEBUG` for verbose output, or add file handlers).

## How it Works

1.  `LiveTranscriber.start_transcription` is called.
2.  It configures callbacks on the `VoxiumClient` instance.
3.  It runs `asyncio.run(_run_internal)`, which calls `LiveTranscriber.start`.
4.  `LiveTranscriber.start` gets the current event loop.
5.  It enters the `VoxiumClient` async context (`__aenter__`), which calls `client.connect`.
6.  `client.connect` establishes the WebSocket connection, handles authentication, receives initial info, and starts the `client._receiver` task in the background.
7.  `LiveTranscriber.start` calls `setup_audio`, which creates and starts `sounddevice.InputStream`. The `_audio_callback` begins running in a separate thread.
8.  `_audio_callback` captures audio chunks, converts them to bytes, and uses `loop.call_soon_threadsafe` to put them on the `audio_queue`.
9.  `LiveTranscriber.start` starts the `_audio_loop` task.
10. `_audio_loop` waits for audio bytes from the `audio_queue`.
11. When audio arrives, `_audio_loop` calls `client.send_audio_chunk`.
12. `client.send_audio_chunk` base64 encodes the audio and sends it as a JSON message over the WebSocket.
13. Concurrently, `client._receiver` listens for messages from the server.
14. When a transcription message arrives, `_receiver` parses it and calls the registered `on_transcription` callback (your `handle_transcription` function).
15. This continues until `start_transcription` is interrupted (`Ctrl+C`) or a fatal error occurs.
16. On exit (interrupt or completion/error of `start`), `asyncio.run` handles task cancellation. The `finally` blocks in `start` and the client's `__aexit__` method (`close`) ensure the audio stream and WebSocket connection are cleaned up.

## Dependencies

* **`websockets`**: For WebSocket client implementation.
* **`numpy`**: For handling audio data arrays from `sounddevice`.
* **`sounddevice`**: For accessing the microphone and capturing audio streams.
* **`typing`**: For type checking

## Troubleshooting

* **`PortAudioError` / No Sound / "Invalid input device":**
    * Ensure a microphone is plugged in and enabled in your system settings.
    * Verify PortAudio is installed correctly (`brew install portaudio`, `apt-get install ...`).
    * Check if another application is exclusively using the microphone.
    * Try specifying a device ID in `sd.InputStream(device=...)` if the default is wrong. Use `python -m sounddevice` to list devices.
    * Ensure the microphone supports the required settings (16000 Hz, Mono, 16-bit Integer - `RATE`, `CHANNELS`, `SD_DTYPE`).
* **Connection Refused / Invalid Status Code / Timeout:**
    * Double-check the `VOXIUM_SERVER_URL`.
    * Verify your `VOXIUM_API_KEY` is correct and valid.
    * Check your internet connection and any firewall/proxy settings that might block WebSocket connections (port 443 for `wss://`).
* **Authentication Errors:**
    * Ensure the `apiKey` query parameter is being sent correctly (check logs if `DEBUG` level enabled) and matches what Voxium expects.
* **No Transcriptions Received:**
    * Check if the microphone is picking up sound (enable `DEBUG` logging to see audio chunk scheduling/sending).
    * Verify the correct `VOXIUM_LANGUAGE` is set.
    * Look for error messages in the logs from the client or server.
* **Import Errors:**
    * Ensure the Python files (`client.py`, `live_transcribe.py`) are located where Python can find them (e.g., same directory as your main script, or installed as part of a package). Adjust import statements (`from .client ...` vs `from voxium_client ...`) as needed based on your project structure.
