Metadata-Version: 1.2
Name: uttut
Version: 1.0.0
Summary: Yoctol Utterance processing utilities
Home-page: https://github.com/Yoctol/uttut
Author: cph
License: MIT
Description: UTTUT
        =====
        
        |travis| |codecov| |pypi| |release|
        
        UTTerance UTilities for dialogue system. This package provides some
        general utils when processing chatbot utterance data.
        
        Installation
        ============
        
        ::
        
            $ pip install uttut
        
        Usage
        =====
        
        Let's create a Pipe to preprocess a Datum with English utterance.
        
        .. code:: python
        
            >>> from uttut.pipeline.pipe import Pipe
        
            >>> p = Pipe()
            >>> p.add('IntTokenWithSpace')
            >>> p.add('FloatTokenWithSpace')
            >>> p.add('MergeWhiteSpaceCharacters')
            >>> p.add('StripWhiteSpaceCharacters')
            >>> p.add('EngTokenizer')  # word-level (ref: BERT)
            >>> p.add('AddSosEos')
            >>> p.add('Pad')
            >>> p.add(
                'Token2Index',
                {
                    '<sos>': 0, '<eos>': 1,  # for  AddSosEos
                    '<unk>': 2, '<pad>': 3,  # for Pad
                    '_int_': 4,  # for IntTokenWithSpace
                    '_float_': 5,  # for FloatTokenWithSpace
                    'I': 6,
                    'apples': 7,
                },
            )
        
            >>> from uttut.elements import Datum, Entity, Intent
            >>> datum = Datum(
                utterance='I like apples.',
                intents=[Intent(label=1), Intent(label=2)],
                entities=[Entity(start=7, end=12, value='apples', label=7)],
            )
            >>> output_indices, intent_labels, entity_labels, realigner = p.transform(datum)
            >>> output_indices
            [0, 6, 2, 7, 1, 3, 3]
            >>> intent_labels
            [1, 2]
            >>> entity_labels
            [0, 0, 0, 7, 0, 0, 0]
        
            >>> realigner(entity_labels)
            [0, 0, 0, 0, 0, 0, 0, 7, 7, 7, 7, 7, 7, 0] 
        
        Serialization
        =============
        
        Serialize
        ---------
        
        .. code:: python
        
            >>> serialized_str = p.serialize()
        
        Deserialize
        -----------
        
        .. code:: python
        
            >>> from uttut.pipeline.pipe import Pipe
            >>> p = Pipe.deserialize(serialized_str )
        
        .. |travis| image:: https://img.shields.io/travis/Yoctol/uttut.svg?style=flat
           :target: https://travis-ci.org/Yoctol/uttut
        .. |codecov| image:: https://codecov.io/gh/Yoctol/uttut/branch/master/graph/badge.svg
           :target: https://codecov.io/gh/Yoctol/uttut
        .. |pypi| image:: https://img.shields.io/pypi/v/uttut.svg?style=flat
           :target: https://pypi.python.org/pypi/uttut
        .. |release| image:: https://img.shields.io/github/release/Yoctol/uttut.svg
        
Platform: UNKNOWN
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Development Status :: 3 - Alpha
Requires-Python: >=3.5
