Metadata-Version: 2.1
Name: catie
Version: 2.2.0
Summary: Tools to compute cohort-attention tied entropy, including CATSensitivity, CATSpecificity, and their combination - CATmean
Home-page: https://github.com/jianfeizhang/catie
Author: Jianfei Zhang
Author-email: jianfei.zhang@live.ca
License: UNKNOWN
Platform: UNKNOWN
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.7
Description-Content-Type: text/markdown
License-File: LICENSE

This is a package to compute cohort-attention tied entropy (CATE), in three forms CATSensitivity, CATSpecificity, and their combination -- CATmean, given the particular cohorts of interest. The package produces AUC, specificity, sensitivity, and the detailed measure wrt every of cohorts as well. Sensitivity and Specificity are in effect a particular case of CATSensitivity and CATSpecificity. 


# Cohort-attention Tied Entropy

## CATSensitivity is given by

<img src="https://latex.codecogs.com/png.latex?\text{CATSensitivity}=\left(1-\frac{1}{1+e^{0.5-\alpha}}\right)\frac{1}{|K^+_\star|}\sum_{\forall k\in K^+_\star}A^+_k + \left(\frac{1}{1+e^{0.5-\alpha}}\right)\frac{1}{|K|-|K^+_\star|}\sum_{\forall k \in K/K^+_\star}A^-_k" />

## and CATSpecificity by
<img src="https://latex.codecogs.com/png.latex?\text{CATSpecificity}=\frac{\alpha}{|K^+_\star|}\sum_{\forall k\in K^+_\star}A^+_k + \frac{(1-\alpha)}{|K|-|K^+_\star|}\sum_{\forall k \in K/K^+_\star}A^-_k" /> 

* Excellent: <b>CATSensitivity = CATSpecificity = 1</b> when all samples are correctly identified
* Bad: <b>CATSensitivity = CATSpecificity = 0</b> when all are incorrectly identified
* No fit: <b>CATSensitivity = CATSpecificity = 0.5</b> when 50% of samples are correctly predicted if they are identically treated (i.e., alpha = 0.5)

The TPR and TNR wrt a cohort can be the accuracy on the entropy basis:

<img src="https://latex.codecogs.com/png.latex?A^\ast_k=\frac{\sum_{\forall i\in C^+_k}(-p^\ast_i\log p^\ast_i)\cdot {Acc}^\ast_i}{\sum_{\forall i\in C^+_k}(-p^\ast_i\log p^\ast_i)},\ \ \forall\ast\in\{+,-\}" />

* <img src="https://latex.codecogs.com/png.latex?\alpha\in[0,1]" /> is an user-defined cohort's weight (<img src="https://latex.codecogs.com/png.latex?K_\star" /> here!), value of 0.5 by default
* <img src="https://latex.codecogs.com/png.latex?Acc^\ast_i" /> is the accuracy wrt individual <img src="https://latex.codecogs.com/png.latex?i" />
* <img src="https://latex.codecogs.com/png.latex?\alpha\in[0,1]" /> is the accuracy wrt individual <img src="https://latex.codecogs.com/png.latex?i" />'s samples only

## CATmean

<img src="https://latex.codecogs.com/png.latex?\text{CATmean}_\beta = \sqrt{\frac{(1+\beta^2)\cdot\text{CATSensitivity}\cdot\text{CATSpecificity}}{\beta^2\cdot\text{CATSensitivity}+\text{CATSpecificity}}}" />

<img src="https://latex.codecogs.com/png.latex?\beta=1" /> by default


## install 
```
pip3 install catie
```
## upgrade to the latest version
```
pip3 install catie -U
```

# Usage

Example: the result file 'results.csv' looks as below

```
ID, tied_ID, cohort, true_label, pred_label, pred_proba
1, A, c1, 1, 1, 0.8
2, A, c1, 1, 0, 0.2
3, B, c1, 0, 0, 0.3
4, C, c2, 1, 1, 0.9
5, C, c2, 1, 1, 0.7
6, C, c2, 1, 1, 0.6
7, D, c2, 0, 1, 0.8
8, D, c2, 0, 0, 0.3
9, E, c3, 1, 1, 0.6
```

## Codes

```
import os
import sys
import numpy as np
import pandas as pd
import math
from metrics import CAT

# load results
df = pd.read_csv('results.csv')


cols = ['ID','tied_ID','cohort', 'true_label', 'pred_label', 'pred_proba']
col_ID, col_tied_ID, col_cohort, col_true_label, col_pred_label, col_pred_proba = cols[0], cols[1], cols[2], cols[3], cols[4], cols[5]

# the cohorts of interest, .e.g., c1 and c2, two values of the cohort columns
sig = ['c1','c2']

# user-predefined weight for sig, alpha=0.7 by default
alpha = 0.7

# cut of the predicted probability (e.g., predicted as a positive if proba > 0.5 and negative otherwise) 
cut_proba = 0.5

cat = CAT(
	col_ID = col_ID, 
	col_tied_ID = col_tied_ID, 
	col_cohort = col_cohort, 
	col_true_label = col_true_label, 
	col_pred_label = col_pred_label, 
	col_pred_proba = col_pred_proba
	)


# pred_proba: predicted probabilities
pred_proba = df['pred_proba']

# initialize dataframe
cat.init_data(df)

# set prediected probability
cat.set_proba(pred_proba)

# compute AUC
cat.get_auc()

# identify positive versus negative based on the cutoff
cat.dichotomize(cut_proba)

# compute Sensitivity and Specificity
cat.get_sen_spe()

# compute the CATE
cat.score(sig, alpha, beta)

