Metadata-Version: 1.1
Name: russiannames
Version: 1.0
Summary: Russian names parser, gender identification and processing tools
Home-page: https://github.com/datacoon/russiannames
Author: Ivan Begtin
Author-email: ivan@begtin.tech
License: BSD
Description: 
        # Russian Names
        
        
        `russiannames` is a Python 3 library dedicated to parse Russian names, surnames and midnames, identify person gender by fullname and how name is written. It uses MongoDB as backend to speed-up name parsing.
        
        
        
        ## Documentation
        
        Documentation is built automatically and can be found on
        https://russiannames.readthedocs.org/en/latest/
        
        ## Installation
        
        To install Python library use `pip install russiannames` via pip or `python setup.py install` 
        
        To use database you need MongoDB instance. 
        Unpack db_data_bson.zip file from https://github.com/datacoon/russiannames/blob/master/data/bson/db_dump_bson.zip
        
        and use `mongorestore` command to restore `names` database with 3 collections: names, surnames and midnames
        
        ## Features
        
        Database of names used for identification
        
        * 375449 surnames - collection: surnames
        * 32134 first names - collection: names
        * 48274 midnames - collection: midnames
        
        Detailed database statistics by gender and collection
        
        | collection| total | males|females|universal or unidentified |
        | --- | --- | --- | --- | --- |
        | names | 32134 | 19297 | 8278 | 1196 |
        | midnames | 48274 | 30114 | 16143 | 0 |
        | surnames | 375274 | 124662 | 111534 | 38827 |
        
        
        Supports 12 formats of Russian full names writing style
        
        | Format | Example        | Description  |
        | ------ | -------------- | ------------ |
        | f | Ольга | only first name |
        | s | Петров | only surname |
        | Fs | О. Сидорова | first letter of first name and full surname |
        | sF | Николаев С. | full surname and first letter of surname |
        | sf | Абрамов Семен | full surname and full first name |
        | fs | Соня Камиуллина | full first name and full surname |
        | fm | Иван Петрович | full first name and full middlename |
        | SFM | М.Д.М. | first letters of surname, first name, middlename |
        | FMs | А.Н. Егорова | first letters of first and middle name and full furname |
        | sFM | Николаенко С.П. | full surname and first letters of first and middle names |
        | sfM | Петракова Зинаида М. | full surname, first name and first letter of middle name |
        | sfm | Казаков Ринат Артурович | full name as surname, first name and middle name |
        | fms | Светлана Архиповна Волкова | full name as first name, middle name and surname |
        
        
        Supports names with following ethnics identification
        
        9 ethnic types in names, surnames and middle names supported
        
        | key  | name (en) | name (rus)
        | ---- | --------- | ----------
        | arab | Arabic     | Арабское
        | arm  | Armenian     | Армянское
        | geor | Georgian     | Грузинское
        | germ | German     | Немецкие
        | greek | Greek    | Греческие
        | jew  | Jew      | Еврейские
        | polsk | Polish    | Польские
        | slav | Slavic (Russian) | Славянские
        | tur  | Turkic | Тюркские (тюркоязычные)
        
        
        ## Limitations
        
        * very rare names, surnames or middlenames could be not parsed 
        * ethnic identification is still on early stage
        
        
        ## Speed optimization
        
        * preconfigured and preindexed MongoDb collections used
        
        
        ## Usage and Examples
        
        ### Parse name and identify gender
        
        Parses names and returns: format, surname, first name, middle name, parsed (True/False) and gender 
        
            >>> from russiannames.parser import NamesParser
            >>> parser = NamesParser()
            >>> parser.parse('Нигматуллин Ринат Ахметович')
            {'format': 'sfm', 'sn': 'Нигматуллин', 'fn': 'Ринат', 'mn': 'Ахметович', 'gender': 'm', 'text': 'Нигматуллин Ринат Ахметович', 'parsed': True}
            >>> parser.parse('Петрова C.Я.')
            {'format': 'sFM', 'sn': 'Петрова', 'fn_s': 'C', 'mn_s': 'Я', 'gender': 'f', 'text': 'Петрова C.Я.', 'parsed': True}
        
        Gender field could have one of following values:
        
        * m: Male
        * f: Female
        * u: Unknown / unidentified
        * -: Impossible to identify
            
        ### Ethnic identification (experimental)
        Parses surname, first name and middle name and tries to identify person ethic affilation of the person
        
            >>> from russiannames.parser import NamesParser
            >>> parser = NamesParser()
            >>> parser.classify('Нигматуллин', 'Ринат', 'Ахметович')
            {'ethnics': ['tur'], 'gender': 'm'}
            >>> parser.classify('Алексеева', 'Ольга', 'Ивановна')
            {'ethnics': ['slav'], 'gender': 'f'}
        
        
        
        
        ## Supported languages
        * Russian
        
        
        
        ## Requirements
        * pymongo
        * click
        
        
        ## Acknowledgements
        
        
        
Keywords: russian names surnames gender
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Programming Language :: Python :: Implementation :: PyPy
