Metadata-Version: 2.4
Name: unquad
Version: 0.1.9
Summary: Conformal Anomaly Detection
Project-URL: Homepage, https://github.com/OliverHennhoefer/unquad
Project-URL: Bugs, https://github.com/OliverHennhoefer/unquad/issues
Author-email: Oliver Hennhoefer <oliver.hennhoefer@mail.de>
License: BSD 3-Clause License
        
        Copyright (c) 2024, Oliver Hennhöfer
        
        Redistribution and use in source and binary forms, with or without
        modification, are permitted provided that the following conditions are met:
        
        1. Redistributions of source code must retain the above copyright notice, this
           list of conditions and the following disclaimer.
        
        2. Redistributions in binary form must reproduce the above copyright notice,
           this list of conditions and the following disclaimer in the documentation
           and/or other materials provided with the distribution.
        
        3. Neither the name of the copyright holder nor the names of its
           contributors may be used to endorse or promote products derived from
           this software without specific prior written permission.
        
        THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
        AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
        IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
        DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
        FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
        DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
        SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
        CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
        OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
        OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
License-File: LICENSE
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.12
Requires-Dist: numpy~=1.26.0
Requires-Dist: pandas>=2.2.1
Requires-Dist: pyarrow>=16.1.0
Requires-Dist: pyod~=2.0.3
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: scipy>=1.13.0
Requires-Dist: tqdm>=4.66.2
Provides-Extra: all
Requires-Dist: black; extra == 'all'
Requires-Dist: tensorflow>=2.16.1; extra == 'all'
Requires-Dist: torch>=2.2.2; extra == 'all'
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Provides-Extra: dl
Requires-Dist: tensorflow>=2.16.1; extra == 'dl'
Requires-Dist: torch>=2.2.2; extra == 'dl'
Description-Content-Type: text/markdown

# *unquad*: Uncertainty-Quantified Anomaly Detection

[![License](https://img.shields.io/badge/License-BSD_3--Clause-blue.svg)](https://opensource.org/licenses/BSD-3-Clause) ![PyPI - Python Version](https://img.shields.io/pypi/pyversions/unquad)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

**unquad** is a wrapper applicable for most [*PyOD*](https://pyod.readthedocs.io/en/latest/) detectors (see [Supported Estimators](#supported-estimators)) enabling
**uncertainty-quantified anomaly detection** based on one-class classification and the principles of **conformal inference**.

```sh
pip install unquad
```

Mind the **optional dependencies** for, e.g., using deep learning models (see [pyproject.toml](https://github.com/OliverHennhoefer/unquad/blob/main/pyproject.toml)).

## What is *Conformal Anomaly Detection*?

[![start with why](https://img.shields.io/badge/start%20with-why%3F-brightgreen.svg?style=flat)](https://www.diva-portal.org/smash/get/diva2:690997/FULLTEXT02.pdf)

[*Conformal Anomaly Detection*](https://www.diva-portal.org/smash/get/diva2:690997/FULLTEXT02.pdf) applies the principles of conformal inference ([*conformal prediction*](https://en.wikipedia.org/wiki/Conformal_prediction#:~:text=Conformal%20prediction%20(CP)%20is%20a,assuming%20exchangeability%20of%20the%20data.)) to anomaly detection.
*Conformal Anomaly Detection* focuses on controlling error metrics like the [*false discovery rate*](https://en.wikipedia.org/wiki/False_discovery_rate), while maintaining [*statistical power*](https://en.wikipedia.org/wiki/Power_of_a_test).

CAD converts anomaly scores to _p_-values by comparing anomaly scores of test data against anomaly scores of calibration data as part of the training data (*normal* instances).
The resulting _p_-value of the test score(s) is computed as the normalized rank among the calibration scores.
These **statistically valid** _p_-values enable error control through methods like *Benjamini-Hochberg*, replacing traditional anomaly estimates that lack statistical guarantees.

### Usage: Split-Conformal (Inductive Approach)

Using the default behavior of `ConformalDetector()` with default `DetectorConfig()`.

```python
from pyod.models.gmm import GMM

from unquad.strategy.split import Split
from unquad.estimation.conformal import ConformalDetector

from unquad.data.load import load_shuttle
from unquad.utils.metrics import false_discovery_rate, statistical_power

x_train, x_test, y_test = load_shuttle(setup=True)

ce = ConformalDetector(
    detector=GMM(),
    strategy=Split(calib_size=1_000)
)

ce.fit(x_train)
estimates = ce.predict(x_test)

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
```

Output:
```text
Empirical FDR: 0.108
Empirical Power: 0.892
```

The behavior can be customized by changing the `DetectorConfig()`:

```python
@dataclass
class DetectorConfig:
    alpha: float = 0.2  # Nominal FDR value
    adjustment: Adjustment = Adjustment.BH  # Multiple testing procedure
    aggregation: Aggregation = Aggregation.MEDIAN  # Score aggregation (if applicable)
    seed: int = 1
    silent: bool = True
```

### Usage: Bootstrap-after-Jackknife+ (JaB+)

The `BootstrapConformal()` strategy allows to set 2 of the 3 parameters `resampling_ratio`, `n_boostraps` and `n_calib`.
For either combination, the remaining parameter will be filled automatically. This allows exact control of the
calibration procedure when using a bootstrap strategy.

```python
from pyod.models.iforest import IForest

from unquad.estimation.properties.configuration import DetectorConfig
from unquad.estimation.conformal import ConformalDetector
from unquad.strategy.bootstrap import Bootstrap
from unquad.utils.enums import Aggregation, Adjustment

from unquad.data.load import load_shuttle
from unquad.utils.metrics import false_discovery_rate, statistical_power

x_train, x_test, y_test = load_shuttle(setup=True)

ce = ConformalDetector(
    detector=IForest(behaviour="new"),
    strategy=Bootstrap(resampling_ratio=0.99, n_bootstraps=20, plus=True),
    config=DetectorConfig(alpha=0.1, adjustment=Adjustment.BH, aggregation=Aggregation.MEAN),
)

ce.fit(x_train)
estimates = ce.predict(x_test)

print(f"Empirical FDR: {false_discovery_rate(y=y_test, y_hat=estimates)}")
print(f"Empirical Power: {statistical_power(y=y_test, y_hat=estimates)}")
```

Output:
```text
Empirical FDR: 0.067
Empirical Power: 0.933
```

### Supported Estimators

The package only supports anomaly estimators that are suitable for unsupervised one-class classification. As respective
detectors are therefore exclusively fitted on *normal* (or *non-anomalous*) data, parameters like *threshold* are internally
set to the smallest possible values.

Models that are **currently supported** include:

* Angle-Based Outlier Detection (**ABOD**)
* Autoencoder (**AE**)
* Cook's Distance (**CD**)
* Copula-based Outlier Detector (**COPOD**)
* Deep Isolation Forest (**DIF**)
* Empirical-Cumulative-distribution-based Outlier Detection (**ECOD**)
* Gaussian Mixture Model (**GMM**)
* Histogram-based Outlier Detection (**HBOS**)
* Isolation-based Anomaly Detection using Nearest-Neighbor Ensembles (**INNE**)
* Isolation Forest (**IForest**)
* Kernel Density Estimation (**KDE**)
* *k*-Nearest Neighbor (***k*NN**)
* Kernel Principal Component Analysis (**KPCA**)
* Linear Model Deviation-base Outlier Detection (**LMDD**)
* Local Outlier Factor (**LOF**)
* Local Correlation Integral (**LOCI**)
* Lightweight Online Detector of Anomalies (**LODA**)
* Locally Selective Combination of Parallel Outlier Ensembles (**LSCP**)
* GNN-based Anomaly Detection Method (**LUNAR**)
* Median Absolute Deviation (**MAD**)
* Minimum Covariance Determinant (**MCD**)
* One-Class SVM (**OCSVM**)
* Principal Component Analysis (**PCA**)
* Quasi-Monte Carlo Discrepancy Outlier Detection (**QMCD**)
* Rotation-based Outlier Detection (**ROD**)
* Subspace Outlier Detection (**SOD**)
* Scalable Unsupervised Outlier Detection (**SUOD**)

## Contact
**Bug reporting:** [https://github.com/OliverHennhoefer/unquad/issues](https://github.com/OliverHennhoefer/unquad/issues)
