Metadata-Version: 2.1
Name: cer
Version: 1.0.0
Summary: Translation Edit Rate on the character level
Home-page: https://github.com/BramVanroy/CharacTER
Author: Bram Vanroy
Author-email: bramvanroy@hotmail.com
License: GPLv3
Project-URL: Issue tracker, https://github.com/BramVanroy/CharacTER/issues
Project-URL: Source, https://github.com/BramVanroy/CharacTER
Keywords: machine-translation machine-translation-evaluation evaluation mt
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Operating System :: OS Independent
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Requires-Python: >=3.7
Description-Content-Type: text/markdown
Provides-Extra: dev
License-File: LICENSE

# CharacTER

CharacTER: Translation Edit Rate on Character Level

CharacTer is a novel character level metric inspired by the commonly applied translation edit rate (Ter). It is defined as the minimum number of character edits required to adjust a hypothesis, until it completely matches the reference, normalized by the length of the hypothesis sentence. CharacTer calculates the character level edit distance while performing the shift edit on word level. Unlike the strict matching criterion in Ter, a hypothesis word is considered to match a reference word and could be shifted, if the edit distance between them is below a threshold value. The Levenshtein distance between the reference and the shifted hypothesis sequence is computed on the character level. In addition, the lengths of hypothesis sequences instead of reference sequences are used for normalizing the edit distance, which effectively counters the issue that shorter translations normally achieve lower Ter.

Paper can be found under ./WMT2016_CharacTer.pdf

Implementations in CharacTER.py

You may have to install the python package "python-Levenshtein" first.

usage: CharacTER.py [-h] -r REF -o HYP [-v]

CharacTER: Character Level Translation Edit Rate

optional arguments:  
  -h, --help         show this help message and exit  
  -r REF, --ref REF  Reference file  
  -o HYP, --hyp HYP  Hypothesis file  
  -v, --verbose      Print score of each sentence

Please apply 'PYTHONIOENCODING' in environment variables, if 
UnicodeEncodeError occurs.

# Modifications Bram Vanroy

Bram Vanroy packaged this library to be compatible with PyPi. Therefore, some packaging modifications have been done
but implementation-wise nothing has changed. The PDF-file of the paper was removed in favor of adding a CITATION file. 

The original license applies, i.e., GPL v3.

