Metadata-Version: 2.1
Name: pyvi
Version: 0.0.9.6
Summary: Python Vietnamese Toolkit
Home-page: https://github.com/trungtv/pyvi
Author: Viet-Trung Tran
Author-email: trungtv@soict.hust.edu.vn
License: MIT
Description-Content-Type: UNKNOWN
Keywords: Vietnamese natural language processing
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: Vietnamese
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Requires-Dist: scikit-learn
Requires-Dist: sklearn-crfsuite
Provides-Extra: dev
Requires-Dist: check-manifest; extra == 'dev'
Provides-Extra: test
Requires-Dist: coverage; extra == 'test'

Python Vietnamese Toolkit
=========================
Functionality
=============

- Tokenize

- POS tag

- Remove accents


Algorithm: Conditional Random Field

Vietnamese tokenizer f1_score = 0.978637686

Vietnamese pos tagging f1_score = 0.92520656

POS TAGS: 

- A - Adjective
- C - Coordinating conjunction
- E - Preposition
- I - Interjection
- L - Determiner
- M - Numeral
- N - Common noun
- Nc - Noun Classifier
- Ny - Noun abbreviation
- Np - Proper noun
- Nu - Unit noun
- P - Pronoun
- R - Adverb
- S -  Subordinating conjunction
- T - Auxiliary, modal words
- V - Verb
- X - Unknown
- F - Filtered out (punctuation)

============
Installation
============

At the command line with pip

.. code-block:: shell

    $ pip install pyvi

**Uninstall**

.. code-block:: shell

    $ pip uninstall pyvi

=====
Usage
=====

.. code-block:: python

    from pyvi import ViTokenizer, ViPosTagger

    ViTokenizer.tokenize(u"Trường đại học bách khoa hà nội")

    ViPosTagger.postagging(ViTokenizer.tokenize(u"Trường đại học Bách Khoa Hà Nội")

    from pyvi import ViUtils
    ViUtils.remove_accents(u"Trường đại học bách khoa hà nội")




