Metadata-Version: 2.1
Name: es-translator
Version: 1.7.2
Summary: A lazy yet bulletproof machine translation tool for Elastichsearch.
License: GNU AFFERO GENERAL PUBLIC LICENSE
Author: ICIJ
Author-email: engineering@icij.org
Requires-Python: >=3.8.2,<3.11
Classifier: License :: Other/Proprietary License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Requires-Dist: argostranslate (>=1.7,<2.0)
Requires-Dist: celery[redis] (>=5.3.1,<6.0.0)
Requires-Dist: click (>=8,<9)
Requires-Dist: coloredlogs
Requires-Dist: deb-pkg-tools (>=8.4,<9.0)
Requires-Dist: elasticsearch (>=7.10,<7.11)
Requires-Dist: elasticsearch-dsl (>=7.4.0,<8.0.0)
Requires-Dist: pycountry (>=22.3,<23.0)
Requires-Dist: rich (>=12,<13)
Requires-Dist: sh (>=1,<2)
Requires-Dist: torch (>=1.13,<2.0)
Requires-Dist: urllib3 (>=1.26,<2.0)
Description-Content-Type: text/markdown

# ES Translator [![](https://img.shields.io/github/actions/workflow/status/icij/es-translator/main.yml)](https://github.com/ICIJ/es-translator/actions) [![](https://img.shields.io/pypi/pyversions/es-translator)](https://pypi.org/project/es-translator/) 


A lazy yet bulletproof machine translation tool for Elastichsearch.

```
Usage: es-translator [OPTIONS]

Options:
  -u, --url TEXT                  Elastichsearch URL
  -i, --index TEXT                Elastichsearch Index  [required]
  -r, --interpreter TEXT          Interpreter to use to perform the
                                  translation
  -s, --source-language TEXT      Source language to translate from
                                  [required]
  -t, --target-language TEXT      Target language to translate to  [required]
  --intermediary-language TEXT    An intermediary language to use when no
                                  translation is available between the source
                                  and the target. If none is provided this
                                  will be calculated automatically.
  --source-field TEXT             Document field to translate
  --target-field TEXT             Document field where the translations are
                                  stored
  -q, --query-string TEXT         Search query string to filter result
  -d, --data-dir PATH             Path to the directory where to language
                                  model will be downloaded
  --scan-scroll TEXT              Scroll duration (set to higher value if
                                  you're processing a lot of documents)
  --dry-run                       Don't save anything in Elasticsearch
  --pool-size INTEGER             Number of parallel processes to start
  --pool-timeout INTEGER          Timeout to add a translation
  --throttle INTEGER              Throttle between each translation (in ms)
  --syslog-address TEXT           Syslog address
  --syslog-port INTEGER           Syslog port
  --syslog-facility TEXT          Syslog facility
  --stdout-loglevel TEXT          Change the default log level for stdout
                                  error handler
  --progressbar / --no-progressbar
                                  Display a progressbar
  --help                          Show this message and exit.
```

## Installation (Ubuntu)

Install Apertium:

```
wget https://apertium.projectjj.com/apt/install-nightly.sh -O - | sudo bash
sudo apt install apertium-all-dev
```

Create a Virtualenv and install Pip packages with Poetry:

```
make install
```

On Ubuntu 22.04 some additional packages might be needed if you use the version from Ubuntu's repository:

```
sudo apt install cg3 apertium-get apertium-lex-tools
```


## Installation (Docker)

Nothing to do as long as you have Docker on your system:

```
docker run -it icij/es-translator poetry run es-translator --help
```

## Examples

Translates documents from French to Spanish on a local Elasticsearch. The translated field is `content` (the default).

```bash
poetry run es-translator --url "http://localhost:9200" --index my-index --source-language fr --target-language es
```

Translates documents from French to English on a local Elasticsearch using Apertium:

```bash
poetry run es-translator --url "http://localhost:9200" --index my-index --source-language fr --target-language en --interpreter apertium
```

To translate the `title` field we could do:

```bash
poetry run es-translator --url "http://localhost:9200" --index my-index --source-language fr --target-language es --source-field title
```

Translates documents from English to Spanish on a local Elasticsearch using 4 threads:

```bash
poetry run es-translator --url "http://localhost:9200" --index my-index --source-language en --target-language es --pool-size 4
```

Translates documents from Portuguese to English, using an intermediary language (Apertium doesn't offer this translation pair):

```bash
poetry run es-translator --url "http://localhost:9200" --index my-index --source-language pt --intermediary-language es --target-language en
```

