Metadata-Version: 2.0
Name: data-science-utilities
Version: 0.2.0
Summary: Data Science utilities in python.
Home-page: https://github.com/truocphamkhac/data-science-utilities
Author: Truoc Pham
Author-email: truoc.phamkhac@asnet.com.vn
License: MIT license
Keywords: data_science_utilities
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5
Classifier: Programming Language :: Python :: 3.6
Requires-Dist: Click (>=6.0)

======================
Data Science Utilities
======================


.. image:: https://img.shields.io/pypi/v/data_science_utilities.svg
        :target: https://pypi.python.org/pypi/data_science_utilities

.. image:: https://img.shields.io/travis/truocphamkhac/data_science_utilities.svg
        :target: https://travis-ci.org/truocphamkhac/data_science_utilities

.. image:: https://readthedocs.org/projects/data-science-utilities/badge/?version=latest
        :target: https://data-science-utilities.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status




Data Science utilities in python.


* Free software: MIT license
* Documentation: https://data-science-utilities.readthedocs.io.


Features
--------

* Missing Data Statistic

>>>
from data_science_utilities import data_science_utilities
# make statistic
missing_data = data_science_utilities.missing_data_stats(df)
# display statistic
missing_data


* Read CSV files from path

>>>
from data_science_utilities import data_science_utilities
train_path = '../data/raw/train.csv'
test_path = '../data/raw/test.csv'
X_train, X_test = data_science_utilities.read_csv_files(train_path, test_path)


* Plotting distribution normal

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_dist_norm(dist, 'distribution normal')


* Plotting correlation matrix

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_corelation_matrix(data)


* Plotting top attributes correlation matrix

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_top_corelation_matrix(data, target, k=10, cmap='YlGnBu')


* Plotting attributes by scatter chart

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_scatter(data, column_name, target)


* Plotting attributes by box bar

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_box(data, column_name, target)


* Plotting category by box bar

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_category_columns(data, limit_bars=10)


* Generate a simple plot of the test and traning learning curve

>>>
from data_science_utilities import data_science_utilities
data_science_utilities.plot_learning_curve(estimator, title, X, y, ylim=None,
                    cv=None, train_sizes=np.linspace(.1, 1.0, 5))


Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage


=======
History
=======

0.2.0 (2018-05-14)
------------------

* Adds utils about visualization.


0.1.0 (2018-05-11)
------------------

* First release on PyPI.


