Metadata-Version: 2.1
Name: superior-scoring-rules
Version: 1.0.3
Summary: PBS and PLL are superior evaluation metrics for probabilistic classifiers, fixing flaws in Brier Score (MSE) and Log Loss (Cross-Entropy). Strictly proper, consistent, and better for model selection, early stopping, and checkpointing.
Home-page: https://github.com/Ruhallah93/superior-scoring-rules
Author: Rouhollah Ahmadian
Author-email: ruhallah.ahmadian@gmail.com
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.0.0
Description-Content-Type: text/markdown
Requires-Dist: tensorflow>=2.0.0

# Superior Scoring Rules: Better Metrics for Probabilistic Evaluation
[GitHub](https://github.com/Ruhallah93/superior-scoring-rules), [arXiv Preprint](https://arxiv.org/pdf/2407.17697)

> PBS and PLL are superior evaluation metrics for probabilistic classifiers, fixing flaws in Brier Score (MSE) and Log Loss (Cross-Entropy). Strictly proper, consistent, and better for model selection, early stopping, and checkpointing.
## Problem with Traditional Metrics  
Accuracy-based metrics (Accuracy, F1) treat all correct predictions equally, ignoring confidence. In high-stakes domains, confidence calibration is critical:

- Cancer Diagnosis: 51% vs. 99% confidence in malignancy should not be treated differently.

- ICU Triage & Mortality: Overconfident mispredictions risk patient safety.

- Autonomous Vehicles: Decisions depend on uncertainty about obstacles.

- Financial Risk Modeling: Pricing and investment hinge on calibrated probabilities.

- Security Threat Detection: High-confidence false negatives undermine defenses.

Thus, Accuracy or F1 Score alone is insufficient: they ignore the confidence of predictions.

## Limitations of MSE & Cross-Entropy

Mean Squared Error (Brier Score) and Cross-Entropy (Log Loss) are strictly proper scoring rules, rewarding calibration. However, they can still favor incorrect predictions over correct ones. Example: 

| Vector | True Label (Y) | Predicted Probabilities (P) | Brier Score | Log Loss | State |
|--------|----------------|-----------------------------|-------------|----------|-------|
| **`A`**  | `[0, 1, 0]`    | `[0.33, 0.34, 0.33]`        | 0.6534      | 0.4685   |   Correct |
| **`B`**  | `[0, 1, 0]`    | `[0.51, 0.49, 0.00]`        | 0.5202      | 0.3098   |   Incorrect |  

Both MSE and Log Loss favor B over A, contradicting the principle of rewarding correct predictions.

## Our Solution: PBS & PLL  
To ensure correct predictions always receive better scores, we introduce a penalty term for misclassifications:

- **Penalized Brier Score (PBS)**

- **Penalized Logarithmic Loss (PLL)**

These metrics are both strictly proper and superior (never favor wrong over right).


## Quick Start

### Installation from PyPI
```bash
pip install superior-scoring-rules
```

### Install from Source (Development)
Clone the repository:
```bash
git clone https://github.com/Ruhallah93/superior-scoring-rules.git
```


### Basic Usage
```python
import tensorflow as tf
from superior_scoring_rules import pbs, pll

# Sample data (batch_size=3, num_classes=4)
y_true = tf.constant([[1, 0, 0, 0], [0, 1, 0, 0], [0, 0, 0, 1]])
y_pred = tf.constant([[0.9, 0.05, 0.05, 0], 
                     [0.1, 0.8, 0.05, 0.05],
                     [0.1, 0.1, 0.1, 0.7]])

print("PBS:", pbs(y_true, y_pred).numpy())
print("PLL:", pll(y_true, y_pred).numpy())
```

### Early Stopping & Checkpointing
Use PBS/PLL instead of val_loss:
```python
class PBSCallback(tf.keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        logs = logs or {}
        logs['val_pbs'] = pbs(self.validation_data[1], self.model.predict(self.validation_data[0]))
        # or
        logs['val_pll'] = pll(self.validation_data[1], self.model.predict(self.validation_data[0]))

model.fit(..., callbacks=[PBSCallback(),
    tf.keras.callbacks.EarlyStopping(monitor='val_pbs', patience=5, mode='min'),
    tf.keras.callbacks.ModelCheckpoint('best.h5', monitor='val_pbs', save_best_only=True)
])
```

## Paper & Citation

- [Superior scoring rules for probabilistic evaluation of single-label multi-class classification tasks](https://www.sciencedirect.com/science/article/abs/pii/S0888613X25000623)

- arXiv: [2407.17697](https://arxiv.org/pdf/2407.17697)

```
@article{ahmadian2025superior,
  title={Superior scoring rules for probabilistic evaluation of single-label multi-class classification tasks},
  author={Ahmadian, Rouhollah and Ghatee, Mehdi and Wahlstr{\"o}m, Johan},
  journal={International Journal of Approximate Reasoning},
  pages={109421},
  year={2025},
  publisher={Elsevier}
}
```

