Metadata-Version: 2.1
Name: scikit-psl
Version: 0.2.0
Summary: Probabilistic Scoring List classifier
Home-page: https://github.com/stheid/scikit-psl
License: MIT
Author: Stefan Heid
Author-email: stefan.heid@upb.de
Requires-Python: >=3.9,<3.13
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Provides-Extra: doc
Provides-Extra: test
Requires-Dist: joblib (>=1.3.2,<2.0.0)
Requires-Dist: numpy (>=1.25.2,<2.0.0)
Requires-Dist: pandas (>=2.0.3,<3.0.0)
Requires-Dist: pytest (>=7.4,<8.0) ; extra == "test"
Requires-Dist: scikit-learn (>=1.3.0,<2.0.0)
Requires-Dist: scipy (>=1.11.1,<2.0.0)
Requires-Dist: sphinx (>=7.1,<8.0) ; extra == "doc"
Requires-Dist: sphinx_rtd_theme (>=1.2,<2.0) ; extra == "doc"
Project-URL: Repository, https://github.com/stheid/scikit-psl
Description-Content-Type: text/markdown

[![License](https://img.shields.io/github/license/stheid/scikit-psl)](https://github.com/stheid/scikit-psl/blob/master/LICENSE)
[![Pip](https://img.shields.io/pypi/v/scikit-psl)](https://pypi.org/project/scikit-psl)


# Probabilistic Scoring Lists

Probabilistic scoring lists are incremental models that evaluate one feature of the dataset at a time.
PSLs can be seen as a extension to *scoring systems* in two ways:
- they can be evaluated at any stage allowing to trade of model complexity and prediction speed.
- they provide a probability distribution over scores instead of hard thresholds.

Scoring Systems are used as decision support for human experts in medical or law domains.

The implementation adheres to the [sklearn-api](https://scikit-learn.org/stable/glossary.html#glossary-estimator-types).

# Install
```bash
pip install scikit-psl
```

# Usage

```python
from sklearn.datasets import make_classification
from sklearn.model_selection import ShuffleSplit

from skpsl import ProbabilisticScoringList

# Generating synthetic data with continuous features and a binary target variable
X, y = make_classification(random_state=42)
X = (X > .5).astype(int)

psl = ProbabilisticScoringList([-1, 1, 2])

for train, test in ShuffleSplit(1, test_size=.2, random_state=42).split(X):
    psl.fit(X[train], y[train])
    print(f"Brier score: {psl.score(X[test], y[test]):.4f}")
    #>  Brier score: 0.1924  (lower is better)

    df = psl.inspect(5)
    print(df.to_string(index=False, na_rep="-", justify="center", float_format=lambda x: f"{x:.2f}"))    
    #>  Stage  Score  T = -3  T = -2  T = -1  T = 0  T = 1  T = 2  T = 3
    #>   0        -       -       -       -   0.54      -      -      - 
    #>   1     2.00       -       -       -   0.18      -   0.97      - 
    #>   2    -1.00       -       -    0.00   0.28   0.91   1.00      - 
    #>   3    -1.00       -    0.00    0.07   0.86   0.91   1.00      - 
    #>   4     1.00       -    0.00    0.00   0.29   0.92   1.00   1.00 
    #>   5    -1.00    0.00    0.00    0.00   0.40   1.00   1.00   1.00
```

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## 0.2.0 - 2023-08-10
### Added
- PSL classifier
  - introduced parallelization
  - implemented l-step lookahead
  - simple inspect(·) method that creates a tabular representation of the model
    

## 0.1.0 - 2023-08-08
### Added
- Initial implementation of the PSL algorithm
