Metadata-Version: 2.1
Name: nlp-text-search
Version: 0.6.14
Summary: Fulltext-like search using NLP concept
Home-page: https://github.com/basalovyurij/nlp-text-search
Author: Yurij Basalov
Author-email: basalov_yurij@mail.ru
License: UNKNOWN
Description: # Fulltext-like search using NLP concept
        
        ![Python 3.6, 3.7](https://img.shields.io/badge/python-3.6%20%7C%203.7-green.svg)
        
        Library for fulltext search using NLP concept. Use [deeppavlov](https://deeppavlov.ai)
        for paraphrase identification and Vantage-Point tree (based on [jvptree](https://github.com/jchambers/jvptree/)) for fast search.
        
        ## Installation
        
        Install and update using pip:
        ```
        pip install -U nlp-text-search
        ```
        
        ## Usage
        
        First init data, create [deeppavlov](https://deeppavlov.ai) settings and 
        Doc2Vec for emdedding.
        ```
        import deeppavlov
        from gensim.models.doc2vec import Doc2Vec, TaggedDocument
        from gensim.utils import simple_preprocess
        from nlp_text_search import create_settings, LinearizedDist, SaveableVPTreeSearchEngine
        
        paraphrases = [
            (('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 'СЃРёРЅСЏСЏ СЂСѓС‡РєР°'), 1),
            (('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 'Р·РµР»РµРЅР°СЏ СЂСѓС‡РєР°'), 1),
            (('РєСЂР°СЃРЅР°СЏ РјР°С€РёРЅР°', 'СЃРёРЅСЏСЏ РјР°С€РёРЅР°'), 1),
            (('РєСЂР°СЃРЅР°СЏ РјР°С€РёРЅР°', 'Р·РµР»РµРЅР°СЏ РјР°С€РёРЅР°'), 1),
            (('СЃРёРЅСЏСЏ СЂСѓС‡РєР°', 'РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°'), 1),
            (('СЃРёРЅСЏСЏ СЂСѓС‡РєР°', 'Р·РµР»РµРЅР°СЏ СЂСѓС‡РєР°'), 1),
            (('СЃРёРЅСЏСЏ РјР°С€РёРЅР°', 'РєСЂР°СЃРЅР°СЏ РјР°С€РёРЅР°'), 1),
            (('СЃРёРЅСЏСЏ РјР°С€РёРЅР°', 'Р·РµР»РµРЅР°СЏ РјР°С€РёРЅР°'), 1),
            (('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 'РєСЂР°СЃРЅР°СЏ РјР°С€РёРЅР°'), 0),
            (('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 'СЃРёРЅСЏСЏ РјР°С€РёРЅР°'), 0),
            (('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 'Р·РµР»РµРЅР°СЏ РјР°С€РёРЅР°'), 0),
            (('СЃРёРЅСЏСЏ СЂСѓС‡РєР°', 'РєСЂР°СЃРЅР°СЏ РјР°С€РёРЅР°'), 0),
            (('СЃРёРЅСЏСЏ СЂСѓС‡РєР°', 'СЃРёРЅСЏСЏ РјР°С€РёРЅР°'), 0),
            (('СЃРёРЅСЏСЏ СЂСѓС‡РєР°', 'Р·РµР»РµРЅР°СЏ РјР°С€РёРЅР°'), 0)
        ]
        all_texts = list(set([t[0][0] for t in paraphrases] + [t[0][1] for t in paraphrases]))
        
        settings = create_settings(paraphrases, 'test')
        deeppavlov.train_model(settings)
        doc2vec = Doc2Vec([TaggedDocument(simple_preprocess(t), [i]) for i, t in enumerate(all_texts)],
                          min_count=1, workers=1, negative=0, dm=0, hs=1)
        ```
        
        Then create search engine and search nearest neighbors
        ```
        se = DefaultSearchEngine(settings, doc2vec, LinearizedDist, points=all_texts)
        print(se.search('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 4))
        ```
        returns
        ```
        [('РєСЂР°СЃРЅР°СЏ СЂСѓС‡РєР°', 0), ('Р·РµР»РµРЅР°СЏ СЂСѓС‡РєР°', 0.05778998136520386), ('СЃРёРЅСЏСЏ СЂСѓС‡РєР°', 0.06721997261047363), ('СЃРёРЅСЏСЏ РјР°С€РёРЅР°', 0.48162001371383667)]
        ```
        
        You also can save and load search engine
        ```
        se.save('se')
        se = DefaultSearchEngine.load('se')
        ```
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.6
Description-Content-Type: text/markdown
