Metadata-Version: 2.1
Name: scikit-lego
Version: 0.3.2
Summary: a collection of lego bricks for scikit-learn pipelines
Home-page: https://scikit-lego.readthedocs.io/en/latest/
Author: Vincent D. Warmerdam & Matthijs Brouns
License: UNKNOWN
Platform: UNKNOWN
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.6
Classifier: Programming Language :: Python :: 3.7
Classifier: License :: OSI Approved :: MIT License
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Description-Content-Type: text/markdown
Requires-Dist: numpy (>=1.15.4)
Requires-Dist: scipy (>=1.2.0)
Requires-Dist: scikit-learn (>=0.20.2)
Requires-Dist: pandas (>=0.23.4)
Requires-Dist: patsy (>=0.5.1)
Requires-Dist: autograd (>=1.2)
Requires-Dist: cvxpy (>=1.0.24)
Requires-Dist: Deprecated (>=1.2.6)
Provides-Extra: dev
Requires-Dist: sphinx (>=1.8.5) ; extra == 'dev'
Requires-Dist: sphinx-rtd-theme (>=0.4.3) ; extra == 'dev'
Requires-Dist: nbsphinx (==0.4.2) ; extra == 'dev'
Requires-Dist: flake8 (>=3.6.0) ; extra == 'dev'
Requires-Dist: matplotlib (>=3.0.2) ; extra == 'dev'
Requires-Dist: pytest (>=4.0.2) ; extra == 'dev'
Requires-Dist: nbval (>=0.9.1) ; extra == 'dev'
Requires-Dist: plotnine (>=0.5.1) ; extra == 'dev'
Requires-Dist: jupyter (>=1.0.0) ; extra == 'dev'
Requires-Dist: jupyterlab (>=0.35.4) ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.6.1) ; extra == 'dev'
Requires-Dist: pytest-mock (>=1.6.3) ; extra == 'dev'
Provides-Extra: docs
Requires-Dist: sphinx (>=1.8.5) ; extra == 'docs'
Requires-Dist: sphinx-rtd-theme (>=0.4.3) ; extra == 'docs'
Requires-Dist: nbsphinx (==0.4.2) ; extra == 'docs'

[![Build status](https://github.com/koaning/scikit-lego/workflows/Unit%20Tests/badge.svg)](https://github.com/{github_id}/{repository}/workflows/{workflow_name}/badge.svg)
[![Documentation Status](https://readthedocs.org/projects/scikit-lego/badge/?version=latest)](https://scikit-lego.readthedocs.io/en/latest/?badge=latest)
[![Downloads](https://pepy.tech/badge/scikit-lego/month)](https://pepy.tech/project/scikit-lego/month)

# scikit-lego

![](images/logo.png)

We love scikit learn but very often we find ourselves writing
custom transformers, metrics and models. The goal of this project
is to attempt to consolidate these into a package that offers
code quality/testing. This project is a collaboration between
multiple companies in the Netherlands. 

Note that we're not formally affiliated with the scikit-learn project at all. Same holds with lego. 

## Installation

Install `scikit-lego` via pip with

```bash
pip install scikit-lego
```

Alternatively, to edit and contribute you can fork/clone and run:

```bash
pip install -e ".[dev]"
python setup.py develop
```

## Documentation

The documentation can be found [here](https://scikit-lego.readthedocs.io/).

## Usage

```python
# the scikit learn stuff we love
from sklearn.preprocessing import StandardScaler
from sklearn.pipeline import Pipeline

# from scikit lego stuff we add
from sklego.preprocessing import RandomAdder
from sklego.mixture import GMMClassifier

...

mod = Pipeline([
    ("scale", StandardScaler()),
    ("random_noise", RandomAdder()),
    ("model", GMMClassifier())
])

...
```

## Features

Here's a list of features that this library currently offers:

- `sklego.preprocessing.PatsyTransformer` applies a [patsy](https://patsy.readthedocs.io/en/latest/formulas.html) formula
- `sklego.preprocessing.RandomAdder` adds randomness in training
- `sklego.preprocessing.PandasTypeSelector` selects columns based on pandas type
- `sklego.preprocessing.ColumnSelector` selects columns based on column name
- `sklego.preprocessing.ColumnCapper` limits extreme values of the model features
- `sklego.preprocessing.OrthogonalTransformer` makes all features linearly independant
- `sklego.dummy.RandomRegressor` benchmark that predicts random values
- `sklego.naive_bayes.GaussianMixtureNB` classifies by training a 1D GMM per column per class
- `sklego.mixture.GMMClassifier` classifies by training a GMM per class
- `sklego.mixture.GMMOutlierDetector` detects outliers based on a trained GMM
- `sklego.pipeline.DebugPipeline` adds debug information to make debugging easier
- `sklego.meta.DecayEstimator` adds decay to the sample_weight that the model accepts
- `sklego.meta.GroupedEstimator` can split the data into runs and run a model on each
- `sklego.meta.EstimatorTransformer` adds a model output as a feature
- `sklego.metrics.correlation_score` calculates correlation between model output and feature
- `sklego.metrics.p_percent_score` proxy for model fairness with regards to sensitive attribute
- `sklego.datasets.load_chicken` loads in the joyful chickweight dataset
- `sklego.datasets.make_simpleseries` make a simulated timeseries
- `sklego.pandas_utils.log_step` a simple logger-decorator for pandas pipeline steps
- `sklego.pandas_utils.add_lags` adds lag values of certain columns in pandas

## New Features

We want to be rather open here in what we accept but we do demand three
things before they become added to the project:

1. any new feature contributes towards a demonstratable real-world usecase
2. any new feature passes standard unit tests (we have a few for transformers and predictors)
3. the feature has been discussed in the issue list beforehand


