Metadata-Version: 2.4
Name: locsum
Version: 0.5.1
Summary: Terminal tool for offline transcription and summarization of audio/video files.
Project-URL: Repository, https://github.com/monsieurlinux/locsum
Author-email: Monsieur Linux <info@mlinux.ca>
License-Expression: MIT
License-File: LICENSE
Keywords: ai,cli,gpu,llm,privacy,python,self-hosted,youtube
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.11
Requires-Dist: markdown-it-py<5.0.0,>=3.0.0
Requires-Dist: ollama<1.0.0,>=0.6.1
Requires-Dist: openai-whisper>=20250625
Requires-Dist: pymupdf<2.0.0,>=1.26.7
Requires-Dist: weasyprint>=68.0
Description-Content-Type: text/markdown

![Locsum: Batch Offline Transcription and Summarization of Videos](https://github.com/monsieurlinux/locsum/raw/main/img/locsum-batch-offline-transcription-summarization.png "Locsum: Batch Offline Transcription and Summarization of Videos")

# Locsum

[![PyPI][pypi-badge]][pypi-link]
[![License][license-badge]][license-link]

Terminal tool for batch offline transcription and summarization of audio/video files.

## Hardware Requirements

Transcription can run on a CPU without a GPU, but high-quality summarization requires significant GPU resources. I initially used an [NVIDIA Jetson Orin Nano Super Developer Kit][jetson-link]. While capable, its 8GB unified memory limited me to ~8B parameter models, which produced subpar summaries.

I recently upgraded to an [ASUS Ascent GX10][gx10-link], a lower-cost alternative to the [NVIDIA DGX Spark][spark-link]. With 128GB of unified memory, I can now run much larger models. I am currently running a 30B parameter model (quantized) with excellent results. Theoretically, the hardware supports models up to 200B parameters.

## Dependencies

Locsum requires the following external libraries:

- **[markdown-it][markdown-link]:** Used for Markdown to HTML conversion
- **[ollama][ollama-github-link]:** Used for text summarization
- **[PyMuPDF][pymupdf-link]:** Used for PDF analysis
- **[weasyprint][weasyprint-link]:** Used for HTML to PDF conversion
- **[whisper][whisper-link]:** Used for audio transcription

These libraries and their sub-dependencies will be installed automatically when you install Locsum.

## Installation

### Prerequisites

- Ensure `ffmpeg` is installed on your system
- Install [Ollama][ollama-download-link] and pull a [model][ollama-search-link] to use for the summarization (e.g. `ollama pull gemma3:4b`)

### Installation with `pipx`

It is recommended to install Locsum within a [virtual environment][venv-link] to avoid conflicts with system packages. Some Linux distributions enforce this. You can use `pipx` to handle the virtual environment automatically, or create one manually and use `pip`.

`pipx` installs Locsum in an isolated environment and makes it available globally.

**1. Install `pipx`**

- **Linux (Debian / Ubuntu / Mint)**
  
  ```bash
  sudo apt install pipx
  pipx ensurepath
  ```
- **Linux (Other) / macOS**
  
  ```bash
  python3 -m pip install --user pipx
  python3 -m pipx ensurepath
  ```
- **Windows**
  
  ```bash
  python -m pip install --user pipx
  python -m pipx ensurepath
  ```

You may need to reopen your terminal for the PATH changes to take effect. If you encounter a problem, please refer to the official [pipx documentation][pipx-link].

**2. Install Locsum**

```bash
pipx install locsum
```

### Installation with `pip`

If you prefer to manage the virtual environment manually, you can create and activate it by following this [tutorial][venv-link]. Then install Locsum:

```bash
pip install locsum
```

### NVIDIA GPU Support

When installing Locsum, the [PyTorch][pytorch-link] library is installed as a sub-dependency to the [whisper][whisper-link] library. However, the version installed by default doesn't include GPU support. For the transcription to benefit from GPU acceleration, you need to either upgrade PyTorch, or to install [whisper.cpp][whispercpp-github-link] as a replacement to the original whisper library. Locsum supports both options. Whisper.cpp is faster, but the speed gain will depend on your hardware. The first option is simpler, whereas the second option requires some compiling.

#### Option 1: Upgrade PyTorch

**1. Get the CUDA version**

Run `nvidia-smi` to find your driver version (13.0 in my case).

**2. Upgrade PyTorch**

Uninstall PyTorch and reinstall the right CUDA build (cu130 in my case).

- **If Locsum is installed with `pipx`**

  ```sh
  pipx runpip locsum uninstall torch
  pipx inject locsum torch --index-url https://download.pytorch.org/whl/cu130
  ```

- **If Locsum is installed with `pip`** (with the virtual environment activated)

  ```sh
  pip uninstall torch
  pip install torch --index-url https://download.pytorch.org/whl/cu130
  ```

**3. Verify installation**

Run `locsum -c` to check that CUDA is available.

  ```
  PyTorch 2.10.0+cu130
  CUDA 13.0 is available
  ```

#### Option 2: Install Whisper.cpp

Whisper.cpp doesn't need PyTorch, but it still requires [CUDA][cuda-link] and [cuBLAS][cublas-link] to be installed on your system. Refer to the official [whisper.cpp documentation][whispercpp-github-link] for more information.

**1. Install libraries for ffmpeg integration**

If you want to be able to transcribe files such as .aac without first having to convert them to .wav, you need to compile whisper.cpp with ffmpeg support. It seems however this option is only available on [Linux][whispercpp-ffmpeg-link].

```sh
sudo apt install libavcodec-dev libavformat-dev libavutil-dev
```

**2. Clone whisper.cpp repository**

```sh
cd ~  # Or wherever you wish to install whisper.cpp
git clone https://github.com/ggml-org/whisper.cpp.git
cd whisper.cpp
```

**3. Build whisper.cpp**

To enable [CUDA][whispercpp-nvidia-link] and [ffmpeg][whispercpp-ffmpeg-link] support, you need to use the `-DGGML_CUDA=1` and `-DWHISPER_FFMPEG=yes` arguments.

```sh
cmake -B build -DGGML_CUDA=1 -DWHISPER_FFMPEG=yes
cmake --build build -j
```

**4. Verify installation**

```sh
sh ./models/download-ggml-model.sh base.en  # Download the base.en model in ggml format
ffmpeg -i samples/jfk.wav samples/jfk.aac   # Convert the audio file to .aac format
./build/bin/whisper-cli -f samples/jfk.aac  # Transcribe the audio file
```

**5. Configure Locsum**

To enable whisper.cpp in Locsum, ensure the CLI binary and model directory are correctly specified in the [configuration file](#configuration) (`cli_path` and `models_path` settings). If you installed whisper.cpp in your home directory, the default paths should work out of the box.

For optimal speed, experiment to find the fastest combination of `threads` and `processors` on your hardware. On my device (ASUS GX10, 20-core ARM CPU), setting `threads=1` and `processors=18` reduced inference time by ~3× compared to defaults (`threads=4` and `processors=1`).

Use the [bench.py script][whispercpp_bench-link] included with whisper.cpp to benchmark your setup. For example:

```sh
python3 scripts/bench.py -f samples/jfk.wav -t 1,2,4,8 -p 1,2,4,8,16
```

## Deployments

View all releases on:

- **[PyPI Releases][pypi-releases]**
- **[GitHub Releases][github-releases]**

## Usage

### Basic Usage

```bash
locsum [arguments] FILE [FILE ...]
```

### Arguments

| Argument            | Short Flag | Description                                     |
| ------------------- | ---------- | ----------------------------------------------- |
| `--help`            | `-h`       | Show help message                               |
| `--check-cuda`      | `-c`       | Check if CUDA is available                      |
| `--language`        | `-l`       | Set the language of the audio                   |
| `--no-colors`       | `-n`       | Disable color output                            |
| `--no-compact`      | `-N`       | Disable PDF compaction                          |
| `--ollama-model`    | `-o`       | Set the Ollama model for summarization          |
| `--openai-whisper`  | `-O`       | Use OpenAI's Whisper even if Whisper.cpp is available |
| `--reset-config`    | `-r`       | Reset configuration file to default             |
| `--transcribe-only` | `-t`       | Transcribe only, don't generate a summary       |
| `--tiny`            | `-T`       | Use tiny Whisper and Ollama models for testing  |
| `--version`         | `-v`       | Show program's version number and exit          |
| `--whisper-model`   | `-w`       | Set the Whisper model for transcription         |
| `--filter-warnings` | `-W`       | Suppress warnings from PyTorch                  |

## Configuration

When you run Locsum for the first time, a `config.toml` file is automatically created. Its location depends on your operating system (typical paths are listed below):

- **Linux:** `~/.config/locsum`
- **macOS:** `~/Library/Preferences/locsum`
- **Windows:** `C:/Users/YourUsername/AppData/Roaming/locsum`

You can edit this file to customize various settings. Common customizations include Whisper and Ollama models to use.

## VPN Setup

Since the goal is to process files locally, we might as well download them as privately as possible. Here is how I installed and configured WireGuard VPN on my GX10.

First update your system with `sudo apt update && sudo apt upgrade`. If the kernel is updated during this step, a reboot is required before continuing.

- Install WireGuard: `sudo apt install wireguard`
- Download WireGuard configuration from my [Proton VPN][proton-link] account
- Copy the configuration file to `/etc/wireguard/protonvpn.conf` and `chown root:root` (with sudo)
- Test connection manually
  - Connect: `sudo wg-quick up protonvpn`
  - Check connection: `sudo wg`
  - Check IP address: `curl -4 ip.me`
  - Disconnect: `sudo wg-quick down protonvpn`
- Connect at boot: `sudo systemctl enable --now wg-quick@protonvpn.service`
- Reboot and check VPN connection / IP address

## Radio Deactivation

For a truly air-gapped system and to eliminate radiofrequency radiation, use the following methods to disable antennas:

- **Disable Bluetooth**

  ```sh
  sudo systemctl disable --now bluetooth
  sudo systemctl mask bluetooth
  sudo rfkill block bluetooth
  ```

- **Disable wifi**

  ```sh
  sudo systemctl disable --now wpa_supplicant
  sudo systemctl mask wpa_supplicant
  sudo rfkill block wifi
  nmcli radio wifi off  # Redundant, but just in case
  ```

- **Reboot and check**

  ```sh
  sudo systemctl status bluetooth
  sudo systemctl status wpa_supplicant
  sudo rfkill list
  ```

###  Kernel-Level Deactivation

Even after disabling services, the firmware might still attempt background scans, emitting bursts of radiofrequency energy. To completely silence the device, you must prevent the kernel module from loading:

- **Identify the module**

  ```sh
  lspci -k  # Look for the wireless controller and find the module name (e.g. mt7925e)
  ```

- **Blacklist the module**

  ```sh
  # Replace [WIRELESS_MODULE] with the name found above (e.g. mt7925e)
  echo "blacklist [WIRELESS_MODULE]" | sudo tee /etc/modprobe.d/blacklist-wifi.conf
  sudo update-initramfs -u
  ```

- **Reboot and check**

  ```sh
  lsmod | grep [WIRELESS_MODULE]  # Should return nothing
  ```

## License

Copyright (c) 2026 Monsieur Linux

This project is licensed under the MIT License. See the LICENSE file for details.

## Acknowledgements

Thanks to the creators and contributors of all the powerful libraries used in this project for making it possible.

[cublas-link]: https://developer.nvidia.com/cublas
[cuda-link]: https://developer.nvidia.com/cuda-downloads
[github-releases]: https://github.com/monsieurlinux/locsum/releases
[gx10-link]: https://www.asus.com/networking-iot-servers/desktop-ai-supercomputer/ultra-small-ai-supercomputers/asus-ascent-gx10/
[jetson-link]: https://www.nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-orin/nano-super-developer-kit/
[license-badge]: https://img.shields.io/pypi/l/locsum.svg
[license-link]: https://github.com/monsieurlinux/locsum/blob/main/LICENSE
[markdown-link]: https://github.com/executablebooks/markdown-it-py
[ollama-download-link]: https://ollama.com/download
[ollama-github-link]: https://github.com/ollama/ollama-python
[ollama-search-link]: https://ollama.com/search
[pipx-link]: https://github.com/pypa/pipx
[proton-link]: https://protonvpn.com/
[pymupdf-link]: https://github.com/pymupdf/PyMuPDF
[pypi-releases]: https://pypi.org/project/locsum/#history
[pypi-badge]: https://img.shields.io/pypi/v/locsum.svg
[pypi-link]: https://pypi.org/project/locsum/
[pytorch-link]: https://pytorch.org/get-started/locally/
[spark-link]: https://www.nvidia.com/en-us/products/workstations/dgx-spark/
[venv-link]: https://docs.python.org/3/tutorial/venv.html
[weasyprint-link]: https://github.com/Kozea/WeasyPrint
[whisper-link]: https://github.com/openai/whisper
[whispercpp_bench-link]: https://github.com/ggml-org/whisper.cpp#benchmarks
[whispercpp-ffmpeg-link]: https://github.com/ggml-org/whisper.cpp#ffmpeg-support-linux-only
[whispercpp-github-link]: https://github.com/ggml-org/whisper.cpp
[whispercpp-nvidia-link]: https://github.com/ggml-org/whisper.cpp#nvidia-gpu-support
