Metadata-Version: 2.4
Name: ai-sub
Version: 2.1.0
Summary: AI-Powered Subtitle Generation with Translation
Author: FlippFuzz
Project-URL: Homepage, https://github.com/FlippFuzz/ai-sub
Project-URL: Bug Tracker, https://github.com/FlippFuzz/ai-sub/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pysubs2
Requires-Dist: static-ffmpeg
Requires-Dist: pydantic
Requires-Dist: pydantic-ai-slim[google,logfire]
Requires-Dist: pydantic-settings
Requires-Dist: json_repair
Requires-Dist: pyrate-limiter
Dynamic: license-file

# AI Sub: AI-Powered Subtitle Generation with Translation

[![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
[![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)

---

## Overview

**AI Sub** is a command-line tool that leverages Google's **Gemini** models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.

**Key Features:**

- **Multimodal Understanding:** Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
- **Dual-Language Support:** Generates verbatim transcriptions and translations for English and Japanese.
- **Automatic Segmentation:** Automatically splits long videos into smaller segments for efficient processing.

---

## Showcase

Here's an example of subtitles generated by AI Sub:

[![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/old/42h4ydJS3zk.png)](https://raw.githubusercontent.com/FlippFuzz/ai-sub/refs/heads/main/showcase/old/42h4ydJS3zk.v007.srt)

For more examples, please visit [ai-sub-showcase](https://github.com/FlippFuzz/ai-sub-showcase).

---

## How It Works

1.  **Preprocessing:** The input video is segmented into smaller chunks to fit within API context windows and file size limits.
2.  **AI Processing:** Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
3.  **Compilation:** Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.

---

## Installation

**Prerequisites:** Python 3.10 or higher.

1.  **Set up a Python virtual environment:**

    ```bash
    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
    ```

2.  **Install AI Sub:**

    ```bash
    pip install --upgrade ai-sub
    ```

---

## Usage

You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.

### Option 1: Using Google AI Studio API Key

1.  **Obtain your API Key:**
    - Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
    - Click "Create API Key".
    - Copy and securely store your key. **Never disclose your API key publicly.**

2.  **Run the application:**

    ```bash
    ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"
    ```

    _Note: Replace `YOUR_API_KEY` with your actual key and `"path/to/your/video.mp4"` with the video file path._

### Option 2: Using Gemini CLI

1.  **Install and Authenticate Gemini CLI:**
    - Install: `npm install -g @google/gemini-cli`
    - Authenticate: Follow instructions at [gemini-cli](https://github.com/google-gemini/gemini-cli?tab=readme-ov-file#-authentication-options).

2.  **Run the application:**

    ```bash
    ai-sub --ai.model=gemini-cli:gemini-3-flash-preview --split.re-encode.enabled=True "path/to/your/video.mp4"
    ```

    **Important Notes for CLI Mode:**
    - No API key is required; the tool uses your authenticated Gemini CLI instance.
    - Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk. The default re-encoding settings are aggressive and should work for most inputs.
    - **Re-encoding is resource-intensive and will increase processing time.**

---

## Configuration

All settings can be configured via command-line arguments (e.g., `--ai.rpm 10`) or environment variables with the `AISUB_` prefix (e.g., `AISUB_AI_RPM=10`).

### AI Settings (`--ai.*`)

| Argument                   | Description                                                                                                                                                                     | Default                             |
| -------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------- |
| `--ai.model <model>`       | A shorthand to set both `pass1_model` and `pass2_model` to the same value.                                                                                                      | `None`                              |
| `--ai.pass1-model <model>` | The AI model for the first pass of subtitle generation. Use 'google-gla:\<model\>' for Google models, 'openai:\<model\>' for OpenAI, or 'custom:\<url\>' for a custom endpoint. | `google-gla:gemini-3-flash-preview` |
| `--ai.pass2-model <model>` | The AI model for the second pass of subtitle generation (QA & Refinement).                                                                                                      | `google-gla:gemini-3-flash-preview` |
| `--ai.rpm <int>`           | Maximum requests per minute for the AI model.                                                                                                                                   | `4`                                 |
| `--ai.tpm <int>`           | Maximum tokens per minute for the AI model.                                                                                                                                     | `250000`                            |

#### Google AI Settings (`--ai.google.*`)

| Argument                               | Description                                                       | Default                                                  |
| -------------------------------------- | ----------------------------------------------------------------- | -------------------------------------------------------- |
| `--ai.google.key <key>`                | The API key for Google's generative language models.              | `None` (loads from `GOOGLE_API_KEY` or `GEMINI_API_KEY`) |
| `--ai.google.file-cache-ttl <seconds>` | The time-to-live (TTL) in seconds for the Gemini file list cache. | `10`                                                     |
| `--ai.google.use-files-api <bool>`     | Whether to use the Gemini Files API.                              | `True`                                                   |
| `--ai.google.base-url <url>`           | The base URL for the Google AI API.                               | `None`                                                   |

#### Gemini CLI Settings (`--ai.gemini-cli.*`)

| Argument                                         | Description                                                      | Default |
| ------------------------------------------------ | ---------------------------------------------------------------- | ------- |
| `--ai.gemini-cli.timeout <seconds>`              | The timeout in seconds for Gemini CLI operations.                | `600`   |
| `--ai.gemini-cli.overwrite-system-prompt <bool>` | Whether to overwrite the system prompt using `GEMINI_SYSTEM_MD`. | `False` |

### Splitting Settings (`--split.*`)

| Argument                             | Description                                                    | Default |
| ------------------------------------ | -------------------------------------------------------------- | ------- |
| `--split.max-seconds <seconds>`      | The maximum duration in seconds for each video chunk.          | `300`   |
| `--split.start-offset-min <minutes>` | The number of minutes to skip from the beginning of the video. | `0`     |

#### Re-Encode Settings (`--split.re-encode.*`)

| Argument                               | Description                                                                                                            | Default                |
| -------------------------------------- | ---------------------------------------------------------------------------------------------------------------------- | ---------------------- |
| `--split.re-encode.enabled <bool>`     | Re-encode the video chunks to save bandwidth.                                                                          | `False`                |
| `--split.re-encode.fps <int>`          | The framerate to re-encode the video to.                                                                               | `1`                    |
| `--split.re-encode.height <int>`       | The height (resolution) to re-encode the video to.                                                                     | `360`                  |
| `--split.re-encode.bitrate-kb <int>`   | The bitrate in KB/s to re-encode the video to.                                                                         | `35`                   |
| `--split.re-encode.threshold-mb <int>` | The threshold in MB for re-encoding. Files smaller than this will not be re-encoded. Set to 0 to re-encode everything. | `20`                   |
| `--split.re-encode.encoder <encoder>`  | The specific encoder to use (e.g., 'h264_nvenc').                                                                      | `None` (auto-detected) |

### Directory Settings (`--dir.*`)

| Argument           | Description                                    | Default                          |
| ------------------ | ---------------------------------------------- | -------------------------------- |
| `--dir.tmp <path>` | Temporary directory for intermediate files.    | `tmp_<video_name>` in output dir |
| `--dir.out <path>` | Output directory for the final subtitle files. | Same directory as input video    |

### Concurrency Settings (`--thread.*`)

| Argument                    | Description                                                                                                      | Default |
| --------------------------- | ---------------------------------------------------------------------------------------------------------------- | ------- |
| `--thread.uploads <int>`    | The number of concurrent threads for uploading video segments. This is only used for Gemini (google-gla) models. | `4`     |
| `--thread.re-encode <int>`  | The number of concurrent threads for re-encoding video chunks.                                                   | `2`     |
| `--thread.subtitles1 <int>` | The number of concurrent threads to use for Pass 1 (Transcription).                                              | `4`     |
| `--thread.subtitles2 <int>` | The number of concurrent threads to use for Pass 2 (QA).                                                         | `4`     |

### Retry Settings (`--retry.*`)

| Argument                  | Description                                                                   | Default |
| ------------------------- | ----------------------------------------------------------------------------- | ------- |
| `--retry.run <int>`       | The maximum number of times to retry a failed job in this run of the program. | `3`     |
| `--retry.max <int>`       | The absolute maximum number of times a job can be retried in total.           | `9`     |
| `--retry.delay <seconds>` | The number of seconds to wait between retries.                                | `30`    |

### Logging Settings (`--log.*`)

| Argument                  | Description                                          | Default |
| ------------------------- | ---------------------------------------------------- | ------- |
| `--log.level <level>`     | The minimum log level to display.                    | `info`  |
| `--log.timestamps <bool>` | Whether to include timestamps in the console output. | `False` |
| `--log.scrub <bool>`      | Whether to scrub sensitive data from logs.           | `True`  |

---

## Known Limitations

1.  **Timestamp Accuracy:** Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
2.  **AI Hallucinations:** Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.

If you encounter issues, consider re-processing specific video segments as detailed below.

---

## Advanced: Re-processing Segments

Intermediate files are stored in a temporary directory (default: `tmp_<input_file_name>`). You can customize this location using the `--dir.tmp` flag.

To re-process a specific segment:

1.  Navigate to the temporary directory.
2.  Locate and delete the corresponding `part_XXX.model_name.json` file.
3.  Re-run the script. It will automatically detect missing files and re-process only those segments.
