Metadata-Version: 2.4
Name: ai-sub
Version: 1.2.1
Summary: AI-Powered Subtitle Generation with Translation
Author: FlippFuzz
Project-URL: Homepage, https://github.com/FlippFuzz/ai-sub
Project-URL: Bug Tracker, https://github.com/FlippFuzz/ai-sub/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pysubs2
Requires-Dist: static-ffmpeg
Requires-Dist: pymediainfo
Requires-Dist: pydantic
Requires-Dist: pydantic-ai-slim[google,logfire]
Requires-Dist: pydantic-settings
Requires-Dist: json_repair
Requires-Dist: pyrate-limiter
Dynamic: license-file

# AI Sub: AI-Powered Subtitle Generation with Translation

[![PyPI version](https://img.shields.io/pypi/v/ai-sub)](https://pypi.org/project/ai-sub)
[![Downloads](https://img.shields.io/pypi/dw/ai-sub)](https://pypistats.org/packages/ai-sub)

---

## Overview

**AI Sub** is a command-line tool that leverages Google's **Gemini** models to generate high-quality, audio-synchronized subtitles. It is designed to produce precise English and Japanese subtitles by analyzing both audio and visual cues.

**Key Features:**

- **Multimodal Understanding:** Utilizes video frames for context (e.g., identifying speakers, reading on-screen text) and audio for precise timing.
- **Dual-Language Support:** Generates verbatim transcriptions and translations for English and Japanese.
- **Automatic Segmentation:** Automatically splits long videos into smaller segments for efficient processing.

---

## Showcase

Here's an example of subtitles generated by AI Sub:

[![Video Screenshot](https://github.com/FlippFuzz/ai-sub/raw/main/showcase/old/42h4ydJS3zk.png)](https://raw.githubusercontent.com/FlippFuzz/ai-sub/refs/heads/main/showcase/old/42h4ydJS3zk.v007.srt)

For more examples, please visit the [showcase directory](https://github.com/FlippFuzz/ai-sub/blob/main/showcase/old/README.md).

---

## How It Works

1.  **Preprocessing:** The input video is segmented into smaller chunks to fit within API context windows and file size limits.
2.  **AI Processing:** Each segment is sent to Google Gemini. The AI analyzes the audio for speech and the video for context, following strict prompting rules to generate subtitles.
3.  **Compilation:** Generated subtitles from all segments are merged into a final, chronologically sorted SRT file.

---

## Installation

**Prerequisites:** Python 3.10 or higher.

1.  **Set up a Python virtual environment:**

    ```bash
    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate.bat`
    ```

2.  **Install AI Sub:**

    ```bash
    pip install --upgrade ai-sub
    ```

---

## Usage

You can use AI Sub with either a Google AI Studio API Key or the Gemini CLI.

### Option 1: Using Google AI Studio API Key

1.  **Obtain your API Key:**

    - Sign in to [Google AI Studio](https://aistudio.google.com/app/apikey).
    - Click "Create API Key".
    - Copy and securely store your key. **Never disclose your API key publicly.**

2.  **Run the application:**

    ```bash
    ai-sub --ai.google.key YOUR_API_KEY --ai.model=google-gla:gemini-3-flash-preview "path/to/your/video.mp4"
    ```

    _Note: Replace `YOUR_API_KEY` with your actual key and `"path/to/your/video.mp4"` with the video file path._

### Option 2: Using Gemini CLI

1.  **Install and Authenticate Gemini CLI:**

    - Install: `npm install -g @google/gemini-cli`
    - Authenticate: Follow instructions at [gemini-cli](https://github.com/google-gemini/gemini-cli?tab=readme-ov-file#-authentication-options).

2.  **Run the application:**

    ```bash
    ai-sub --ai.model=gemini-cli:gemini-3-pro-preview --split.re-encode.enabled=True --thread.subtitles=1 "path/to/your/video.mp4"
    ```

    **Important Notes for CLI Mode:**

    - No API key is required; the tool uses your authenticated Gemini CLI instance.
    - Additional arguments are required to split and re-encode the video because the Gemini CLI has a 20MB upload limit per chunk.
    - **Re-encoding is resource-intensive and will increase processing time.**

---

## Known Limitations

1.  **Timestamp Accuracy:** Subtitle timestamps may occasionally be inaccurate. This is an inherent characteristic of the Gemini AI model. Shorter video segments generally yield better accuracy.
2.  **AI Hallucinations:** Like all LLMs, Gemini may occasionally produce "hallucinations" or inaccurate information.

If you encounter issues, consider re-processing specific video segments as detailed below.

---

## Advanced: Re-processing Segments

Intermediate files are stored in a temporary directory (default: `tmp_<input_file_name>`). You can customize this location using the `--dir.tmp` flag.

To re-process a specific segment:

1.  Navigate to the temporary directory.
2.  Locate and delete the corresponding `part_XXX.json` file.
3.  Re-run the script. It will automatically detect missing files and re-process only those segments.
