Metadata-Version: 2.4
Name: owocr
Version: 1.21
Summary: Japanese OCR
Author-email: AuroraWright <fallingluma@gmail.com>
License-Expression: GPL-3.0-only
Project-URL: Homepage, https://github.com/AuroraWright/owocr
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: jaconv
Requires-Dist: loguru
Requires-Dist: numpy
Requires-Dist: Pillow>=10.0.0
Requires-Dist: pyperclipfix
Requires-Dist: pynput<=1.7.8
Requires-Dist: websockets>=14.0
Requires-Dist: desktop-notifier>=6.1.0
Requires-Dist: mss>=10.1.0
Requires-Dist: psutil
Requires-Dist: curl_cffi
Requires-Dist: pywin32; platform_system == "Windows"
Requires-Dist: pyobjc; platform_system == "Darwin"
Provides-Extra: faster-png
Requires-Dist: fpng-py; extra == "faster-png"
Provides-Extra: easyocr
Requires-Dist: easyocr; extra == "easyocr"
Provides-Extra: rapidocr
Requires-Dist: rapidocr>=3.4.0; extra == "rapidocr"
Requires-Dist: onnxruntime; extra == "rapidocr"
Provides-Extra: meikiocr
Requires-Dist: meikiocr; extra == "meikiocr"
Provides-Extra: mangaocr
Requires-Dist: manga-ocr; extra == "mangaocr"
Requires-Dist: setuptools<80; extra == "mangaocr"
Requires-Dist: scipy; extra == "mangaocr"
Requires-Dist: pyclipper; extra == "mangaocr"
Requires-Dist: torchvision; extra == "mangaocr"
Requires-Dist: torch-summary; extra == "mangaocr"
Requires-Dist: opencv-python; extra == "mangaocr"
Requires-Dist: shapely; extra == "mangaocr"
Provides-Extra: winocr
Requires-Dist: winocr; extra == "winocr"
Provides-Extra: oneocr
Requires-Dist: oneocr>=1.0.9; extra == "oneocr"
Provides-Extra: lens
Requires-Dist: protobuf>=6.33.2; extra == "lens"
Provides-Extra: gvision
Requires-Dist: google-cloud-vision; extra == "gvision"
Provides-Extra: azure
Requires-Dist: azure-ai-vision-imageanalysis; extra == "azure"
Dynamic: license-file

# OwOCR

OwOCR is a command-line text recognition tool that continuously scans for images and performs OCR (Optical Character Recognition) on them. Its main focus is Japanese, but it works for many other languages.

## Demo

### Visual novel:
https://github.com/user-attachments/assets/f2196b23-f2e7-4521-820a-88c8bedb9d8e

### Manga:
https://github.com/user-attachments/assets/f061854d-d20f-43e8-8c96-af5a0ea26f43

## Installation

OwOCR has been tested on Python 3.11, 3.12 and 3.13. It can be installed with `pip install owocr` after you install Python. You also need to have one or more OCR engines, check the list below for instructions. I recommend installing at least Google Lens on any operating system, and OneOCR if you are on Windows. Bing is pre-installed, Apple Vision and Live Text come pre-installed on macOS.

## Basic usage

```
owocr
```

This default behavior monitors the clipboard for images and outputs recognized text back to the clipboard.

## Main features

- Multiple input sources: clipboard, folders, websockets, unix domain socket, and screen capture
- Multiple output destinations: clipboard, text files, and websockets
- Pause/unpause with `p` or terminate with `t`/`q` in the terminal, switch between engines with `s` or the engine-specific keys (from the engine list below)
- Capture from specific screen areas, windows, of areas within windows (window capture is only supported on Windows/macOS). This also tries to capture entire sentences and filter all repetitions. If you use an online engine like Lens I recommend setting a secondary local engine with the `-es` option: `-es=oneocr` on Windows and `-es=alivetext` on macOS. With this "two pass" system only the changed areas are sent to the online service, allowing for both speed and accuracy
- Multiple configurable keyboard combinations to control owocr from anywhere, including pausing, switching engines, taking a screenshot of the selected screen/window and running the automatic tool to re-select an area of the screen/window via drag and drop
- Read from a unix domain socket `/tmp/owocr.sock` on macOS/Linux
- Furigana filter, works by default with Japanese text (both vertical and horizontal)

## Common option examples

- Write text to a file: `owocr -w=<txt file path>`
- Read images from a folder: `owocr -r=<folder path>`
- Write text to a websocket: `owocr -w=websocket` (default websocket port is 7331)
- Read from the screen or one/more portion(s) of the screen (opens the automatic drag and drop selector, hold CTRL/cmd to select multiple areas and click to delete them or Backspace to reset): `owocr -r=screencapture`
- Read from a window having "Notepad" in the title: `owocr -r=screencapture -sa=Notepad`
- Read from one/more portion(s) of a window having "Notepad" in the title (opens the automatic drag and drop selector, hold CTRL/cmd to select multiple areas and click to delete them or Backspace to reset): `owocr -r=screencapture -sa=Notepad -swa`

## Configuration

There are many more options and customization features. For complete documentation of all available settings:

- View all command-line options and their descriptions: `owocr -h`
- Check the automatically generated config file at `~/.config/owocr_config.ini` on Linux/macOS, or `C:\Users\yourusername\.config\owocr_config.ini` on Windows
- See a sample config file: [owocr_config.ini](https://raw.githubusercontent.com/AuroraWright/owocr/master/owocr_config.ini)

The command-line options/config file allow you to configure OCR providers, hotkeys, screen capture settings, notifications, and much more.

# Supported engines

## Local
- [Manga OCR](https://github.com/kha-white/manga-ocr) (with optional [comic-text-detector](https://github.com/dmMaze/comic-text-detector) as segmenter) - install: `pip install "owocr[mangaocr]"` → keys: `m` (regular, ideal for small text areas), `n` (segmented, ideal for manga panels/larger images with multiple text areas)
- [EasyOCR](https://github.com/JaidedAI/EasyOCR) - install: `pip install "owocr[easyocr]"` → key: `e`
- [RapidOCR](https://github.com/RapidAI/RapidOCR) - install: `pip install "owocr[rapidocr]"` → key: `r`
- Apple Vision framework - Probably the best local engine to date. **macOS only - Recommended (pre-installed)** → key: `a`
- Apple Live Text (VisionKit framework) - It should be the same as Vision except that in Sonoma Apple added vertical text reading. **macOS only - Recommended (pre-installed)** → key: `d`
- WinRT OCR: install: `pip install "owocr[winocr]"`. It can also be used by installing winocr on a Windows virtual machine and running the server there (`winocr_serve`) and specifying the IP address of the Windows VM/machine in the config file. **Windows 10/11 only** → key: `w`
- OneOCR - install: `pip install "owocr[oneocr]"`. Close second local best to the Apple one. You need to copy 3 system files from Windows 11 to use it, refer to the readme [here](https://github.com/AuroraWright/oneocr). It can also be used by installing oneocr on a Windows virtual machine and running the server there (`oneocr_serve`) and specifying the IP address of the Windows VM/machine in the config file. **Windows 10/11 only - Recommended** → key: `z`
- [meikiocr](https://github.com/rtr46/meikiocr) - install: `pip install "owocr[meikiocr]"`. Comparable to OneOCR in accuracy and CPU latency. Can be run on Nvidia GPUs via `pip uninstall onnxruntime && pip install onnxruntime-gpu` making it the fastest OCR available. Probably best option for Linux users. Can't process vertical text and is limited to 64 text lines and 48 characters per line.  → key: `k`

## Cloud
- Google Lens - install: `pip install "owocr[lens]"`. Arguably the best OCR engine to date. **Recommended** → key: `l`
- Bing - Close second best. **Recommended (pre-installed)** → key: `b`
- Google Vision: install: `pip install "owocr[gvision]"`, you also need a service account .json file named google_vision.json in `user directory/.config/` → key: `g`
- Azure Image Analysis: install: `pip install "owocr[azure]"`, you also need to specify an api key and an endpoint in the config file → key: `v`
- OCRSpace: you need to specify an api key in the config file → key: `o`

# Acknowledgments

This uses code from/references these people/projects:
- Viola for working on the Google Lens implementation (twice!) and helping with the pyobjc VisionKit code!
- @rtr46 for contributing a big overhaul allowing for coordinate support and JSON output
- @bpwhelan for contributing code for other language support and for his ideas (like two pass processing) originally implemented in the Game Sentence Miner fork of owocr
- @bropines for the Bing code ([Github issue](https://github.com/AuroraWright/owocr/issues/10))
- @ronaldoussoren for helping with the pyobjc VisionKit code
- [Manga OCR](https://github.com/kha-white/manga-ocr) for inspiring and being the project owocr was originally derived from
- [Mokuro](https://github.com/kha-white/mokuro) for the comic text detector integration code
- [ocrmac](https://github.com/straussmaximilian/ocrmac) for the Apple Vision framework API
- [ccylin2000_lipboard_monitor](https://github.com/vaimalaviya1233/ccylin2000_lipboard_monitor) for the Windows clipboard polling code
- vicky for the demo videos in this readme!
