Metadata-Version: 2.1
Name: dostoevsky
Version: 0.4.0
Summary: Sentiment analysis library for russian language
Home-page: https://github.com/bureaucratic-labs/dostoevsky
Author: Bureaucratic Labs
Author-email: hello@b-labs.pro
License: MIT
Description: # Dostoevsky [![Build Status](https://travis-ci.org/bureaucratic-labs/dostoevsky.svg?branch=master)](https://travis-ci.org/bureaucratic-labs/dostoevsky) [![FOSSA Status](https://app.fossa.io/api/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky.svg?type=shield)](https://app.fossa.io/projects/git%2Bgithub.com%2Fbureaucratic-labs%2Fdostoevsky?ref=badge_shield)
        
        <img align="right" src="https://i.imgur.com/uLMWPuL.png">
        
        Sentiment analysis library for russian language
        
        ## Install
        
        Please note that `Dostoevsky` supports only Python 3.6+
        
        ```bash
        $ pip install dostoevsky
        ```
        
        ## Social network model [FastText]
        
        This model was trained on [RuSentiment dataset](https://github.com/text-machine-lab/rusentiment) and achieves up to ~0.71 F1 score.  
        Hyperparameters used for training:
        ```
        epoch = 10
        lr = 0.21909
        dim = 64
        minCount = 1
        wordNgrams = 3
        minn = 2
        maxn = 5
        bucket = 259929
        dsub = 2
        loss = one-vs-all
        ```
        
        ### Usage
        
        First of all, you'll need to download binary model:
        
        ```bash
        $ dostoevsky download fasttext-social-network-model
        ```
        
        Then you can use sentiment analyzer:
        
        ```python
        from dostoevsky.tokenization import RegexTokenizer
        from dostoevsky.models import FastTextSocialNetworkModel
        
        tokenizer = RegexTokenizer()
        tokens = tokenizer.split('всё очень плохо')  # [('всё', None), ('очень', None), ('плохо', None)]
        
        model = FastTextSocialNetworkModel(tokenizer=tokenizer)
        
        messages = [
            'привет',
            'я люблю тебя!!',
            'малолетние дебилы'
        ]
        
        results = model.predict(messages, k=2)
        
        for message, sentiment in zip(messages, results):
            """
            привет -> {'speech': 1.0000100135803223, 'skip': 0.0020607432816177607}
            я люблю тебя!! -> {'positive': 0.9886782765388489, 'skip': 0.005394937004894018}
            малолетние дебилы -> {'negative': 0.9525841474533081, 'neutral': 0.13661839067935944}]
            """
            print(message, '->', sentiment)
        ```
        
Keywords: natural language processing,sentiment analysis
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Classifier: Topic :: Text Processing :: Linguistic
Description-Content-Type: text/markdown
