Metadata-Version: 2.1
Name: es-translator
Version: 1.0.2
Summary: A lazy yet bulletproof machine translation tool for Elastichsearch.
Home-page: https://github.com/icij/es-translator
License: UNKNOWN
Description: # ES Translator
        
        [![CircleCI](https://circleci.com/gh/ICIJ/es-translator.svg?style=svg)](https://circleci.com/gh/ICIJ/es-translator)
        
        A lazy yet bulletproof machine translation tool for Elastichsearch.
        
        ```
        Usage: es-translator [OPTIONS]
        
        Options:
          --url TEXT                    Elastichsearch URL  [required]
          --index TEXT                  Elastichsearch Index  [required]
          --source-language TEXT        Source language to translate from  [required]
          --target-language TEXT        Target language to translate to  [required]
          --intermediary-language TEXT  An intermediary language to use when no
                                        translation is available between the source
                                        and the target. If none is provided this will
                                        be calculated automatically.
          --source-field TEXT           Document field to translate
          --target-field TEXT           Document field where the translations are
                                        stored
          --query-string TEXT           Search query string to filter result
          --data-dir PATH               Path to the directory where to language model
                                        will be downloaded
          --scan-scroll TEXT            Scroll duration (set to higher value if you're
                                        processing a lot of documents)
          --dry-run                     Don't save anything in Elasticsearch
          --pool-size INTEGER           Number of parallel processes to start
          --pool-timeout INTEGER        Timeout to add a translation
          --syslog-address TEXT         Syslog address
          --syslog-port INTEGER         Syslog port
          --syslog-facility TEXT        Syslog facility
          --stdout-loglevel TEXT        Change the default log level for stdout error
                                        handler
          --help                        Show this message and exit.
        ```
        
        ## Installation (Ubuntu)
        
        Install Apertium:
        
        ```
        wget https://apertium.projectjj.com/apt/install-release.sh -O - | sudo bash
        sudo apt update
        sudo apt install apertium-all-dev
        ```
        
        Create a Virtualenv and install Pip packages with Pipenv:
        
        ```
        sudo apt install pipenv
        make install
        ```
        
        ## Installation (Docker)
        
        Nothing to do as long as you have Docker on your system:
        
        ```
        docker run -it icij/es-translator python es_translator.py --help
        ```
        
        ## Examples
        
        Translates documents from French to Spanish on a local Elasticsearch. The translated field is `content` (the default).
        
        ```bash
        python es_translator.py --url "http://localhost:9200" --index my-index --source-language fr --target-language es
        ```
        
        To translate the `title` field we could do:
        
        ```bash
        pipenv shelllator.py --url "http://localhost:9200" --index my-index --source-language fr --target-language es --source-field title
        ```
        
        Translates documents from English to Spanish on a local Elasticsearch using 4 threads:
        
        ```bash
        python es_translator.py --url "http://localhost:9200" --index my-index --source-language en --target-language es --pool-size 4
        ```
        
        Translates documents from Portuguese to English, using an intermediary language (Apertium doesn't offer this translation pair):
        
        ```bash
        python es_translator.py --url "http://localhost:9200" --index my-index --source-language pt --intermediary-language es --target-language en
        ```
        
Keywords: datashare,api,text-mining,elasticsearch,apertium,translation
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Affero General Public License v3
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.6
Description-Content-Type: text/markdown
