Metadata-Version: 2.0
Name: namesex-light
Version: 0.2.1
Summary: A lightweight gender classifier for Chinese given names
Home-page: https://github.com/hsinmin/namesex_light
Author: Hsin-Min Lu, Yu-Lun Li, Chi-Yu Lin
Author-email: luim@ntu.edu.tw
License: UNKNOWN
Description-Content-Type: /text/rst
Keywords: classify_sex Chinese_given_name
Platform: UNKNOWN
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Requires-Dist: numpy

namesex_light
-------------

Namesex_light is a lighweight package that predicts the gender tendency of Chinese given names. This module comes with a L2 regularized logistic regression trained on 10,730 Chinese given names (in traditional Chinese) with reliable gender lables collected from public data. The predict() function takes a list of names and output predicted gender tendency (1 for male and 0 for female) or probability of being a male name. Namesex_light has a sister project, namesex, that performs similar tasks with higher accuracy.

Additional information about namesex and namesex_light can be found `in another document (in Chinese) <https://github.com/hsinmin/namesex/blob/master/vignettee_namesex_exp1.ipynb>`_.

The prediction performance evaluated by ten-fold cross validation is:

========= =========== =====================
Metric    Performance Performance Std. Dev.
--------- ----------- ---------------------
Accuracy  0.8957      0.007327
F1        0.8920      0.007873
Precision 0.8852      0.012238
Recall    0.8991      0.008936
Logloss   114.35      6.413972
========= =========== =====================


Use pip/pip3 to install namesex_light.::

    pip install namesex_light

To use namesex_light, pass in an array or list of given names to predict(). For each element in the input list, predict() returns 1 or 0 for male or female prediction. Set "predprob = True" to return probability of being a male name. The following is a simple sample code.::


    >>> import namesex_light
    >>> nsl = namesex_light.namesex_light()
    >>> nsl.predict(['民豪', '愛麗', '志明'])
    array([1, 0, 1])
    >>> nsl.predict(['民豪', '愛麗', '志明'], predprob=True)
    array([0.99968932, 0.00530066, 0.9938986 ])

Note that namesex_light was trained using Chinese given names only. However, it may be used to classifier translated names as well::

    >>> nsl.predict(['阿波羅', '阿波羅', '雷', '艾美', '布蘭妮', '阿曼達'])
    array([1, 1, 1, 0, 0, 1])

This module is intended for a quick plug-and-play. The original training dataset is not included.

Testing Dataset
---------------

This package comes with a small testing dataset that was not used for model training. The following sample code illustrate a simple usage.::

    >>> testdata = namesex_light.testdata()
    >>> nsl = namesex_light.namesex_light()
    >>> pred = nsl.predict(testdata.gname)
    >>> print("The first 5 given names are: {}".format(testdata.gname[0:5]))
    The first 5 given names are: ['翊如', '妤庭', '諆璋', '大閎', '和維']
    >>> print("    and their sex:          {}".format(testdata.sex[0:5]))
        and their sex:          [0, 0, 1, 1, 1]
    >>> print("    and their predicted sex:{}".format(pred[0:5]))
        and their predicted sex:[0 0 1 1 1]
    >>> accuracy = np.sum(pred == testdata.sex) / len(pred)
    >>> print(" Prediction accuracy = {}".format(accuracy))
     Prediction accuracy = 0.8627450980392157

Note that the accuracy is slightly lower compared to the accuracy of ten-fold cross valudation. I guess this is normal since this testset is collected from a source that is different from the training dataset.

