Metadata-Version: 2.0
Name: lemmy
Version: 0.1.0
Summary: Lemmatizer for Danish
Home-page: https://github.com/sorenlind/lemmy/
Author: Soren Lind Kristiansen
Author-email: sorenlind@mac.com
License: UNKNOWN
Description-Content-Type: UNKNOWN
Keywords: nlp lemma lemmatizer lemmatiser danish spacy
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Danish
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3.6
Classifier: Topic :: Text Processing :: Linguistic
Provides-Extra: dev
Requires-Dist: pandas; extra == 'dev'
Requires-Dist: jupyter; extra == 'dev'
Requires-Dist: unicodecsv; extra == 'dev'
Requires-Dist: bs4; extra == 'dev'
Requires-Dist: tqdm; extra == 'dev'
Requires-Dist: regex; extra == 'dev'
Requires-Dist: spacy; extra == 'dev'
Requires-Dist: pylint; extra == 'dev'
Requires-Dist: pycodestyle; extra == 'dev'
Requires-Dist: pydocstyle; extra == 'dev'
Requires-Dist: yapf; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: tox; extra == 'dev'
Provides-Extra: notebooks
Requires-Dist: pandas; extra == 'notebooks'
Requires-Dist: jupyter; extra == 'notebooks'
Requires-Dist: unicodecsv; extra == 'notebooks'
Requires-Dist: bs4; extra == 'notebooks'
Requires-Dist: tqdm; extra == 'notebooks'
Requires-Dist: regex; extra == 'notebooks'
Requires-Dist: spacy; extra == 'notebooks'
Provides-Extra: test
Requires-Dist: pytest; extra == 'test'
Requires-Dist: tox; extra == 'test'

🤘 Lemmy
========

Lemmy is a lemmatizer for Danish 🇩🇰 . It comes already trained on Dansk
Sprognævns (DSN) word list (‘fuldformliste’) and the Danish Universal
Dependencies and is ready for use. Lemmy also supports training on your
own dataset.

The model currently included in Lemmy was evaluated on the Danish
Universal Dependencies dev dataset and scored an accruacy > 99%.

You can use Lemmy as a spaCy extension, more specifcally a spaCy
pipeline component. This is highly recommended and makes the lemmas
easily accessible from the spaCy tokens. Lemmy makes use of POS tags to
predict the lemmas. When wired up to the spaCy pipeline, Lemmy has the
benefit of using spaCy’s builtin POS tagger.

Lemmy can also by used without spaCy, as a standalone lemmatizer. In
that case, you will have to provide the POS tags. Alternatively, you can
train a Lemmy model which does not depend on POS tags, though most
likely the accuracy will suffer.

Lemmy is heavily inspired by the `CST Lemmatizer for
Danish <https://cst.dk/online/lemmatiser/>`__.

Install
-------

.. code:: bash

    pip install lemmy

Usage
-----

.. code:: python

    import da_custom_model as da # name of your spaCy model
    import lemmy.pipe
    nlp = da.load()

    # create an instance of Lemmy's pipeline component for spaCy
    pipe = lemmy.pipe.load()

    # add the comonent to the spaCy pipeline.
    nlp.add_pipe(pipe, after='tagger')

    # lemmas can now be accessed using the `._.lemma` attribute on the tokens
    nlp("akvariernes")[0]._.lemma

Training
--------

The ``notebooks`` folder contains examples showing how to train your own
model using Lemmy.


