Metadata-Version: 2.1
Name: fugashi
Version: 0.1.12rc5
Summary: A Cython wrapper for MeCab
Home-page: https://github.com/polm/fugashi
Author: Paul O'Leary McCann
Author-email: polm@dampfkraft.com
License: MIT
Platform: UNKNOWN
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: Japanese
Requires-Python: >=3.5
Description-Content-Type: text/markdown
Requires-Dist: Cython
Provides-Extra: unidic
Requires-Dist: unidic ; extra == 'unidic'
Provides-Extra: unidic-lite
Requires-Dist: unidic-lite ; extra == 'unidic-lite'

[![Current PyPI packages](https://badge.fury.io/py/fugashi.svg)](https://pypi.org/project/fugashi/)

# fugashi

<img src="https://github.com/polm/fugashi/raw/master/fugashi.png" width=125 height=125 alt="Fugashi by Irasutoya" />

Fugashi is a Cython wrapper for [MeCab](https://taku910.github.io/mecab/).

See the [blog post](https://www.dampfkraft.com/nlp/fugashi.html) for background
on why Fugashi exists and some of the design decisions.

Any reasonable version of MeCab should work, but it's recommended you install
[from source](https://github.com/taku910/mecab).

## Usage

    from fugashi import Tagger

    tagger = Tagger('-Owakati')
    text = "麩菓子（ふがし）は、麩を主材料とした日本の菓子。"
    tagger.parse(text)
    # => '麩 菓子 （ ふ が し ） は 、 麩 を 主材 料 と し た 日本 の 菓子 。'
    for word in tagger.parseToNodeList(text):
        print(word, word.feature.lemma, word.pos, sep='\t')
        # "feature" is the Unidic feature data as a named tuple

## Dictionary Use

Fugashi is written with the assumption you'll use Unidic to process Japanese,
but it supports arbitrary dictionaries. 

If you're using a dictionary besides Unidic you can use the GenericTagger like this:

    from fugashi import GenericTagger
    tagger = GenericTagger()

    # parse can be used as normal
    tagger.parse('something')
    # features from the dictionary can be accessed by field numbers
    for word in tagger.parseToNodeList(text):
        print(word.surface, word.feature[0])

You can also create a dictionary wrapper to get feature information as a named tuple. 

    from fugashi import GenericTagger, create_feature_wrapper
    CustomFeatures = create_feature_wrapper('CustomFeatures', 'alpha beta gamma')
    tagger = GenericTagger(wrapper=CustomFeatures)
    for word in tagger.parseToNodeList(text):
        print(word.surface, word.feature.alpha)

## Alternatives

If you have a problem with Fugashi feel free to open an issue. However, there
are some cases where it might be better to use a different library.

- If you want to use MeCab but don't have a C compiler, use [natto-py](https://github.com/buruzaemon/natto-py).
- If you don't want to deal with installing MeCab at all, try [SudachiPy](https://github.com/WorksApplications/SudachiPy).

Note that these are both slower than Fugashi according to a [benchmark I
wrote](https://github.com/polm/ja-tokenizer-benchmark). 


