Metadata-Version: 2.1
Name: specification-curve
Version: 0.2.3
Summary: Specification Curve is a Python package that performs specification curve analysis.
Home-page: https://github.com/aeturrell/specification_curve
Author: Arthur Turrell
Author-email: a.turrell09@imperial.ac.uk
License: MIT license
Keywords: specification_curve
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: Programming Language :: Python :: 3.8
Requires-Python: >=3.6
Requires-Dist: statsmodels
Requires-Dist: matplotlib
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: scikit-learn

===================
Specification Curve
===================


.. image:: https://img.shields.io/pypi/v/specification_curve.svg
        :target: https://pypi.python.org/pypi/specification_curve

.. image:: https://img.shields.io/travis/aeturrell/specification_curve.svg
        :target: https://travis-ci.com/aeturrell/specification_curve

.. image:: https://readthedocs.org/projects/specification-curve/badge/?version=latest
        :target: https://specification-curve.readthedocs.io/en/latest/?badge=latest
        :alt: Documentation Status

.. image:: https://static.pepy.tech/badge/specification-curve
        :target: https://pepy.tech/project/Specification_curve
        :alt: Downloads

.. image:: https://img.shields.io/pypi/pyversions/specification_curve.svg
        :target: https://pypi.python.org/pypi/specification_curve/
        :alt: Support Python versions


Specification Curve is a Python (3.6+) package that performs specification curve analysis.


* Free software: MIT license
* Documentation: https://specification-curve.readthedocs.io.

Quickstart
----------

Running

.. code-block:: python

   from specification_curve import specification_curve as specy
   from specification_curve import example as scdata
   df = scdata.load_example_data1()
   y_endog = 'y1'
   x_exog = 'x1'
   controls = ['c1', 'c2', 'group1', 'group2']
   sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                 cat_expand=['group2'])
   sc.fit()
   sc.plot()

produces

.. image:: https://raw.githubusercontent.com/aeturrell/specification_curve/master/docs/images/example.png
   :width: 600

Grey squares (black lines when there are many specifications) show whether
a variable is included in a specification or not. Blue markers and error bars
show whether the coefficient is significant (0.05).

Here's another example:

.. code-block:: python

   from specification_curve import specification_curve as specy
   import numpy as np
   import pandas as pd
   n_samples = 300
   np.random.seed(1332)
   x_1 = np.random.random(size=n_samples)
   x_2 = np.random.random(size=n_samples)
   x_3 = np.random.random(size=n_samples)
   x_4 = np.random.randint(2, size=n_samples)
   y = (0.8*x_1 + 0.1*x_2 + 0.5*x_3 + x_4*0.6 +
        + 2*np.random.randn(n_samples))
   df = pd.DataFrame([x_1, x_2, x_3, x_4, y],
                     ['x_1', 'x_2', 'x_3', 'x_4', 'y']).T
   # Set x_4 as a categorical variable
   df['x_4'] = df['x_4'].astype('category')
   sc = specy.SpecificationCurve(df, 'y', 'x_1', ['x_2', 'x_3', 'x_4'],
                                 cat_expand=['x_4'])
   sc.fit()
   sc.plot()


Features
--------

These examples use the first set of **example data**:

.. code-block:: python

    from specification_curve import specification_curve as specy
    from specification_curve import example as scdata
    df = scdata.load_example_data1()

* Expand fixed effects into mutually exclusive groups using ``cat_expand``

.. code-block:: python

    y_endog = 'y1'
    x_exog = 'x1'
    controls = ['c1', 'c2', 'group1', 'group2']
    sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                  cat_expand=['group1', 'group2'])
    sc.fit()
    sc.plot()

* Mutually exclude two variables using ``exclu_grp``

.. code-block:: python

    y_endog = 'y1'
    x_exog = 'x1'
    controls = ['c1', 'c2', 'group1', 'group2']
    sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                      exclu_grps=[['c1', 'c2']])
    sc.fit()
    sc.plot()

* Use multiple independent or dependent variables

.. code-block:: python

    x_exog = ['x1', 'x2']
    y_endog = 'y1'
    controls = ['c1', 'c2', 'group1', 'group2']
    sc = specy.SpecificationCurve(df, y_endog, x_exog, controls)
    sc.fit()
    sc.plot()

* Save plots to file (format is inferred from file extension)

.. code-block:: python

    sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                      cat_expand=['group1'])
    sc.fit()
    sc.plot(save_path='test_fig.pdf')

* Specification results stored in output DataFrame `df_r`

.. code-block:: python

    sc = specy.SpecificationCurve(df, y_endog, x_exog, controls)
    sc.fit()
    print(sc.df_r)

* Other `statsmodels` estimators (OLS is the default) can be used

.. code-block:: python

    import numpy as np
    import pandas as pd
    import statsmodels.api as sm
    n_samples = 1000
    x_2 = np.random.randint(2, size=n_samples)
    x_1 = np.random.random(size=n_samples)
    x_3 = np.random.randint(3, size=n_samples)
    x_4 = np.random.random(size=n_samples)
    x_5 = x_1 + 0.05*np.random.randn(n_samples)
    x_beta = -1 + 3.5*x_1 + 0.2*x_2 + 0.3*x_3
    prob = 1/(1 + np.exp(-x_beta))
    y = np.random.binomial(n=1, p=prob, size=n_samples)
    y2 = np.random.binomial(n=1, p=prob*0.98, size=n_samples)
    df = pd.DataFrame([x_1, x_2, x_3, x_4, x_5, y, y2],
                      ['x_1', 'x_2', 'x_3', 'x_4', 'x_5', 'y', 'y2']).T
    y_endog = ['y', 'y2']
    x_exog = ['x_1', 'x_5']
    controls = ['x_3', 'x_2', 'x_4']
    sc = specy.SpecificationCurve(df, y_endog, x_exog, controls,
                                  cat_expand='x_3')
    sc.fit(estimator=sm.Logit)  # sm.Probit also works
    sc.plot()

* The style of specification flexes for very large numbers of specifications

.. code-block:: python

    n_samples = 400
    # Number of dimensions of continuous
    # random variables
    n_dim = 8
    c_rnd_vars = np.random.random(size=(n_dim, n_samples))
    c_rnd_vars_names = [f'c_{i}' for i in range(np.shape(c_rnd_vars)[0])]
    y_1 = (0.3*c_rnd_vars[0, :] +
           0.5*c_rnd_vars[1, :])
    y_2 = y_1 + 0.05*np.random.randn(n_samples)
    df = pd.DataFrame([y_1, y_2], ['y1', 'y2']).T
    for i, col_name in enumerate(c_rnd_vars_names):
        df[col_name] = c_rnd_vars[i, :]
    controls = c_rnd_vars_names[1:]
    sc = specy.SpecificationCurve(df, ['y1', 'y2'], c_rnd_vars_names[0],
                                  controls)
    sc.fit()
    sc.plot()

Similar Packages
----------------

In RStats, there is specr_ (which inspired many design choices in this package) and spec_chart_. Some of the example data in this package is the same as in specr_.

.. _specr: https://github.com/masurp/specr
.. _spec_chart: https://github.com/ArielOrtizBobea/spec_chart

Credits
-------

This package was created with Cookiecutter_ and the `audreyr/cookiecutter-pypackage`_ project template.

.. _Cookiecutter: https://github.com/audreyr/cookiecutter
.. _`audreyr/cookiecutter-pypackage`: https://github.com/audreyr/cookiecutter-pypackage


=======
History
=======

0.2.3 (2020-14-02)
------------------

* Fixed examples
* Bug fix for including csv data in pypi distribution

0.2.2 (2020-09-02)
------------------

* More badges in readme
* Dropped support for python 3.5

0.2.1 (2020-09-02)
------------------

* Switched to object oriented design
* Now supports range of statsmodels estimators!
* Example showing how to save plots to file in docs
* Example showing where estimation results are stored in docs
* Docs example of very large number of specifications

0.1.1 (2020-08-01)
------------------

* Multiple independent, dependent, and control variables implemented as lists. Mutually exclusive control variables implemented. Expansions of categorical variables into mutually exclusive fixed effects implemented.

0.1.0 (2020-07-27)
------------------

* First release on PyPI.


