Metadata-Version: 2.1
Name: pytesstrain
Version: 0.1.0
Summary: Collection of utilities for Tesseract OCR training
Home-page: https://github.com/wincentbalin/pytesstrain
Author: Wincent Balin
Author-email: wincent.balin+pytesstrain@gmail.com
License: Apache License (2.0)
Keywords: Tesseract,OCR,training
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Topic :: Text Processing
Classifier: Environment :: Console
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Description-Content-Type: text/markdown
Requires-Dist: pytesseract
Requires-Dist: editdistance

# Python utilities for Tesseract OCR training

This module is a collection of different training utilities for [Tesseract OCR](https://github.com/tesseract-ocr/tesseract).
These utilities are also implemented as console scripts, hence they can be run from command line.

## Requirements

This module requires the following modules to work:

* pytesseract (Running Tesseract OCR)
* editdistance (Calculation of error rates)

## Packages

The module is split in several packages. The package `pytesstrain.train` contains the workhorse function
`run_text()`. The package `pytesstrain.cli` contains the tolls you might run at the command line. The package
`pytesstrain.ambigs` contains function around `unicharambigs` file. The package `pytesstrain.text2image` contains
the interface to the `text2image` command from the Tesseract OCR; the interface relies on `pytesseract` module
and is modelled after it as well. The package `pytesstrain.metrics` contains error rate calculations, as well
the interface class `Metrics`. The package `pytesstrain.utils` has auxiliary functions.


