Metadata-Version: 2.1
Name: EthicML
Version: 0.1.0a3
Summary: A toolkit for understanding and researching algorithmic bias
Home-page: https://github.com/predictive-analytics-lab/EthicML
Author: Predictive Analytics Lab - University of Sussex
Author-email: olliethomas86@gmail.com
License: UNKNOWN
Platform: UNKNOWN
Requires-Python: >=3.6
Description-Content-Type: text/markdown
Requires-Dist: imageio (>=2.4.1)
Requires-Dist: matplotlib (>=3.0.2)
Requires-Dist: numpy (>=1.14.2)
Requires-Dist: pandas (>=0.24.0)
Requires-Dist: scikit-learn (>=0.20.1)
Requires-Dist: seaborn (>=0.9.0)
Requires-Dist: torch (<=1.1.0.post2,>=1.1.0)
Requires-Dist: pyarrow (>=0.11)
Requires-Dist: numba
Requires-Dist: fairlearn (>=0.2.0)
Requires-Dist: GitPython (>=2.1.11)
Requires-Dist: tqdm (>=4.31.1)
Requires-Dist: pipenv (>=2018.11.26)
Requires-Dist: tornado (==4.5.3)
Requires-Dist: dataclasses ; python_version < "3.7"
Provides-Extra: dev
Requires-Dist: pylint (>=2.0) ; extra == 'dev'
Requires-Dist: pytest (>=3.3.2) ; extra == 'dev'
Requires-Dist: pytest-cov (>=2.6.0) ; extra == 'dev'
Requires-Dist: mypy (>=0.720) ; extra == 'dev'
Requires-Dist: black ; extra == 'dev'

# EthicML

EthicML exists to combat the problems we've found with off-the-shelf fairness comparison packages.

These other packages are useful, but given that we primarily do research,
a lot of the work we do doesn't fit into some nice box.
For example, we might want to use a 'fair' pre-processing method on the data before training a classifier on it.
We may still be experimenting and only want part of the framework to execute,
or we may want to do hyper-parameter optimization.
Whilst other frameworks can be modified to do these tasks,
you end up with hacked-together approaches that don't lend themselves to be built on in the future.
Because of this,
we're drawing a line in the sand with some of the other frameworks we've used and building our own.

### Why not use XXX?

There are an increasing number of other options,
IBM's fair-360, Aequitas, EthicalML/XAI, Fairness-Comparison and others.
They're all great at what they do, they're just not right for us.
We will however be influenced by them.

## Design Principles

### The Triplet

Given that we're considering fairness, the base of the toolbox is the triplet {x, s, y}

- X - Features
- S - Sensitive Label
- Y - Class Label

All methods must assume S and Y are multi-class.

We use a named tuple to contain the triplet

```python
triplet = DataTuple(x=dataframe, s=dataframe, y=dataframe)
```

The dataframe may be a little innefficient,
but given the amount of splicing on conditions that we're doing it feels worth it.

### Separation of Methods

We purposefully keep pre, during and post algorithm methods separate. This is because they have different return types.

```python
pre_algorithm.run(train: DataTuple, test: DataTuple)  # -> Tuple[pandas.DataFrame, pandas.DataFrame]
in_algorithm.run(train: DataTuple, test: DataTuple)  # -> pandas.DataFrame
post_algorithm.run(preds: DataFrame, test: DataTuple)  # -> pandas.DataFrame
```
where preds is a one column dataframe with the column name 'preds'.

### General Rules of Thumb

- Mutable data structures are bad.
- At the very least, functions should be Typed.
- Readability > Efficiency
- Don't get around warnings by just turning them off.

## Future Plans

Hopefully EthicML becomes a super easy way to look at the biases in different datasets
and get a comparison of different models.


