Metadata-Version: 2.1
Name: TUPA
Version: 1.4.2
Summary: Transition-based UCCA Parser
Home-page: https://github.com/huji-nlp/tupa
Author: Daniel Hershcovich
Author-email: danielh@cs.huji.ac.il
License: UNKNOWN
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Text Processing :: Linguistic
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: numpy (>=1.15.0)
Requires-Dist: cython (>=0.29)
Requires-Dist: tqdm (>=4.32.2)
Requires-Dist: configargparse (>=0.14.0)
Requires-Dist: ucca (<1.3,>=1.2.3)
Requires-Dist: semstr[amr] (<1.3,>=1.2.2)
Requires-Dist: dynet (==2.1)
Requires-Dist: logbook (>=1.5.2)
Provides-Extra: bert
Requires-Dist: torch (==1.3.1) ; extra == 'bert'
Requires-Dist: pytorch-pretrained-bert (==0.6.2) ; extra == 'bert'
Provides-Extra: server
Requires-Dist: Flask (>=0.12.2) ; extra == 'server'
Requires-Dist: Flask-Assets (>=0.12) ; extra == 'server'
Requires-Dist: Flask-Compress (>=1.4.0) ; extra == 'server'
Requires-Dist: Jinja2 (>=2.9.6) ; extra == 'server'
Requires-Dist: matplotlib (>=2.0.2) ; extra == 'server'
Requires-Dist: networkx (>=1.11) ; extra == 'server'
Requires-Dist: webassets (>=0.12.1) ; extra == 'server'
Provides-Extra: viz
Requires-Dist: scipy ; extra == 'viz'
Requires-Dist: pillow ; extra == 'viz'
Requires-Dist: matplotlib ; extra == 'viz'

Transition-based UCCA Parser
============================

TUPA is a transition-based parser for `Universal Conceptual Cognitive
Annotation (UCCA) <http://github.com/huji-nlp/ucca>`__.

Requirements
~~~~~~~~~~~~

-  Python 3.6

Install
~~~~~~~

Create a Python virtual environment. For example, on Linux:

::

    virtualenv --python=/usr/bin/python3 venv
    . venv/bin/activate              # on bash
    source venv/bin/activate.csh     # on csh

Install the latest release:

::

    pip install tupa

Alternatively, install the latest code from GitHub (may be unstable):

::

    git clone https://github.com/danielhers/tupa
    cd tupa
    pip install .

Train the parser
----------------

Having a directory with UCCA passage files (for example, `the English
Wiki
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_English-Wiki>`__),
run:

::

    python -m tupa -t <train_dir> -d <dev_dir> -c <model_type> -m <model_filename>

The possible model types are ``sparse``, ``mlp``, and ``bilstm``.

Parse a text file
~~~~~~~~~~~~~~~~~

Run the parser on a text file (here named ``example.txt``) using a
trained model:

::

    python -m tupa example.txt -m <model_filename>

An ``xml`` file will be created per passage (separate by blank lines in
the text file).

Pre-trained models
~~~~~~~~~~~~~~~~~~

To download and extract `a model pre-trained on the Wiki
corpus <https://github.com/huji-nlp/tupa/releases/download/v1.3.10/ucca-bilstm-1.3.10.tar.gz>`__,
run:

::

    curl -LO https://github.com/huji-nlp/tupa/releases/download/v1.3.10/ucca-bilstm-1.3.10.tar.gz
    tar xvzf ucca-bilstm-1.3.10.tar.gz

Run the parser using the model:

::

    python -m tupa example.txt -m models/ucca-bilstm

Other languages
~~~~~~~~~~~~~~~

To get `a
model <https://github.com/huji-nlp/tupa/releases/download/v1.3.10/ucca-bilstm-1.3.10-fr.tar.gz>`__
pre-trained on the `French *20K Leagues*
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_French-20K>`__
or `a
model <https://github.com/huji-nlp/tupa/releases/download/v1.3.10/ucca-bilstm-1.3.10-de.tar.gz>`__
pre-trained on the `German *20K Leagues*
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_German-20K>`__,
run:

::

    curl -LO https://github.com/huji-nlp/tupa/releases/download/v1.3.10/ucca-bilstm-1.3.10-fr.tar.gz
    tar xvzf ucca-bilstm-1.3.10-fr.tar.gz
    curl -LO https://github.com/huji-nlp/tupa/releases/download/v1.3.10/ucca-bilstm-1.3.10-de.tar.gz
    tar xvzf ucca-bilstm-1.3.10-de.tar.gz

Run the parser on a French/German text file (separate passages by blank
lines):

::

    python -m tupa exemple.txt -m models/ucca-bilstm-fr --lang fr
    python -m tupa beispiel.txt -m models/ucca-bilstm-de --lang de

Using BERT
----------

BERT can be used instead of standard word embeddings. First, install the
required dependencies:

::

    pip install -r requirements.bert.txt

Then pass the ``--use-bert`` argument to the training command.

See the possible configuration options in ``config.py`` (relevant
options have the prefix ``bert``).

BERT Multilingual Training
~~~~~~~~~~~~~~~~~~~~~~~~~~

A multilingual model can be trained, to leverage cross-lingual transfer
and improve results on low-resource languages:

1. Make sure the input passage files have the ``lang`` attribute. See
   the script
   ```set_lang`` <https://github.com/huji-nlp/semstr/blob/master/semstr/scripts/set_lang.py>`__
   in the package ``semstr``.
2. Enable BERT by passing the ``--use-bert`` argument.
3. Use the multilingual model by passing
   ``--bert-model=bert-base-multilingual-cased``.
4. Pass the ``--bert-multilingual=0`` argument to enable multilingual
   training.

BERT Performance
~~~~~~~~~~~~~~~~

Here are the average results over 3 BERT multilingual models trained on
the `German *20K Leagues*
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_German-20K>`__,
`English Wiki
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_English-Wiki>`__
and only on 15 sentences from the `French *20K Leagues*
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_French-20K>`__,
with the following settings:

::

    bert-model=bert-base-multilingual-cased
    bert-layers=-1 -2 -3 -4
    bert-layers-pooling=weighted
    bert-token-align-by=sum

The results:

+-----------------------+-------------------+------------------+----------------+
| description           | test primary F1   | test remote F1   | test average   |
+=======================+===================+==================+================+
| German 20K Leagues    | 0.828             | 0.6723           | 0.824          |
+-----------------------+-------------------+------------------+----------------+
| English 20K Leagues   | 0.763             | 0.359            | 0.755          |
+-----------------------+-------------------+------------------+----------------+
| French 20K Leagues    | 0.739             | 0.46             | 0.732          |
+-----------------------+-------------------+------------------+----------------+
| English Wiki          | 0.789             | 0.581            | 0.784          |
+-----------------------+-------------------+------------------+----------------+

\*\ `English *20K Leagues*
corpus <https://github.com/UniversalConceptualCognitiveAnnotation/UCCA_English-20K>`__
is used as out of domain test.

Pre-trained Models with BERT
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To download and extract `a multilingual
model <https://github.com/huji-nlp/tupa/releases/download/v1.4.0/bert_multilingual_layers_4_layers_pooling_weighted_align_sum.tar.gz>`__
trained with the settings above, run:

::

    curl -LO https://github.com/huji-nlp/tupa/releases/download/v1.4.0/bert_multilingual_layers_4_layers_pooling_weighted_align_sum.tar.gz
    tar xvzf bert_multilingual_layers_4_layers_pooling_weighted_align_sum.tar.gz

To run the parser using the model, use the following command. Pay
attention that you need to replace ``[lang]`` with the right language
symbol (``fr``, ``en``, or ``de``):

::

    python -m tupa example.txt --lang [lang] -m bert_multilingual_layers_4_layers_pooling_weighted_align_sum

Author
------

-  Daniel Hershcovich: daniel.hershcovich@gmail.com

Contributors
------------

-  Ofir Arviv: ofir.arviv@mail.huji.ac.il

Citation
--------

If you make use of this software, please cite `the following
paper <http://aclweb.org/anthology/P17-1104>`__:

::

    @InProceedings{hershcovich2017a,
      author    = {Hershcovich, Daniel  and  Abend, Omri  and  Rappoport, Ari},
      title     = {A Transition-Based Directed Acyclic Graph Parser for {UCCA}},
      booktitle = {Proc. of ACL},
      year      = {2017},
      pages     = {1127--1138},
      url       = {http://aclweb.org/anthology/P17-1104}
    }

The version of the parser used in the paper is
`v1.0 <https://github.com/huji-nlp/tupa/releases/tag/v1.0>`__. To
reproduce the experiments, run:

::

    curl -L https://raw.githubusercontent.com/huji-nlp/tupa/master/experiments/acl2017.sh | bash

If you use the French, German or multitask models, please cite `the
following paper <http://aclweb.org/anthology/P18-1035>`__:

::

    @InProceedings{hershcovich2018multitask,
      author    = {Hershcovich, Daniel  and  Abend, Omri  and  Rappoport, Ari},
      title     = {Multitask Parsing Across Semantic Representations},
      booktitle = {Proc. of ACL},
      year      = {2018},
      pages     = {373--385},
      url       = {http://aclweb.org/anthology/P18-1035}
    }

The version of the parser used in the paper is
`v1.3.3 <https://github.com/huji-nlp/tupa/releases/tag/v1.3.3>`__. To
reproduce the experiments, run:

::

    curl -L https://raw.githubusercontent.com/huji-nlp/tupa/master/experiments/acl2018.sh | bash

License
-------

This package is licensed under the GPLv3 or later license (see
```LICENSE.txt`` <LICENSE.txt>`__).

|Build Status (Travis CI)| |Build Status (AppVeyor)| |Build Status
(Docs)| |PyPI version|

.. |Build Status (Travis CI)| image:: https://travis-ci.org/danielhers/tupa.svg?branch=master
   :target: https://travis-ci.org/danielhers/tupa
.. |Build Status (AppVeyor)| image:: https://ci.appveyor.com/api/projects/status/github/danielhers/tupa?svg=true
   :target: https://ci.appveyor.com/project/danielh/tupa
.. |Build Status (Docs)| image:: https://readthedocs.org/projects/tupa/badge/?version=latest
   :target: http://tupa.readthedocs.io/en/latest/
.. |PyPI version| image:: https://badge.fury.io/py/TUPA.svg
   :target: https://badge.fury.io/py/TUPA


