Metadata-Version: 2.1
Name: easy-tokenizer
Version: 0.0.9
Summary: tokenizer tool
Home-page: https://github.com/tilaboy/easy-tokenizer
Author: Chao Li
Author-email: chaoli.job@gmail.com
License: MIT license
Keywords: tokenizer
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7

Easy-Tokenizer
==================

Description
-----------

Most tokenizers are eithor too cumbersom (Neural Network based), or too simple.
This simple rule based tokenizer is type, small, and sufficient good. Specially,
it handles long strings very often parsed wrong by some simple tokenizers, deal
url, email, long digits rather well.


Try with the following script:

``easy_tokenizer -s input_text``

or

``easy_tokenizer -f input_file``


CI Status
------------

.. image:: https://travis-ci.org/tilaboy/easy-tokenizer.svg?branch=master
    :target: https://travis-ci.org/tilaboy/easy-tokenizer

.. image:: https://readthedocs.org/projects/easy-tokenizer/badge/?version=latest
    :target: https://easy-tokenizer.readthedocs.io/en/latest/?badge=latest
    :alt: Documentation Status


.. image:: https://pyup.io/repos/github/tilaboy/easy-tokenizer/shield.svg
    :target: https://pyup.io/repos/github/tilaboy/easy-tokenizer/
    :alt: Updates

Requirements
------------

Python 3.6+

Installation
------------

::

    pip install easy-tokenizer


Usage
-----

-  easy-tokenizer:

   input:

      - string: input string to tokenize

      - filename: input text file to tokenize

      - output: output filename, optional. print out to STDOUT when not set

   output:

   - a sequence of space separated tokens

examples:
^^^^^^^^^

::

    # string input
    easy-tokenizer -s "this is   a simple test."

    easy-tokenizer -f foo.txt
    easy-tokenizer -f foo.txt -o bar.txt

output will be "this is a simple test ."

Development
-----------

To install package and its dependencies, run the following from project
root directory:

::

    python setup.py install

To work the code and develop the package, run the following from project
root directory:

::

    python setup.py develop

To run unit tests, execute the following from the project root
directory:

::

    python setup.py test


0.0.9 (2020-01-16)
==================

- [BUG] fixed infinite loop for long url


0.0.8 (2019-11-14)
==================

- [New] added a function module for char/string normalization

0.0.7 (2019-11-14)
==================

- [Bugfix] update the url patter to fix the regexp loop for long url string

0.0.5 (2019-10-23)
==================

- [Bugfix] encryption and doc generation

0.0.3 (2019-10-23)
==================

- Test the CI/CD and auto documentation generation


0.0.2 (2019-10-23)
==================

- support script to output result to a file, add documentation



0.0.1 (2019-10-22)
==================

- Add the first version of the tokenizer


