Metadata-Version: 2.4
Name: truthanchor
Version: 0.1.0
Summary: Calibrate raw LLM uncertainty scores into truth-aligned scores.
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: datasets<5,>=2.20
Requires-Dist: evaluate<0.5,>=0.4.0
Requires-Dist: matplotlib<4,>=3.8
Requires-Dist: nltk<4,>=3.8
Requires-Dist: numpy<3,>=1.26
Requires-Dist: pandas<3,>=2.2
Requires-Dist: scikit-learn<2,>=1.4
Requires-Dist: scipy<2,>=1.11
Requires-Dist: torch>=2.2
Requires-Dist: tqdm<5,>=4.66
Requires-Dist: transformers<5,>=4.45

# <img src="./ASSETS/anchor.png" width="3%" align="center"></img> TruthAnchor

TruthAnChor (TAC) calibrates raw uncertainty scores of LLM responses into truth-aligned scores with a lightweight MLP.

<div align=center>
<img src="./ASSETS/reliability.png" width="100%" align="center"></img>
</div>

## What It Does

TruthAnchor supports:

- response generation for benchmark datasets
- raw uncertainty score computation
- truth-anchored score mapping with a lightweight MLP
- optional CUE comparison
- metric reporting with AUROC, ECE, and PRR
- plotting for calibration and score comparison
- custom score-label CSV input

## Installation

```bash
pip install truthanchor
```

For local development from this repository:

```bash
pip install -e .
```

## Quickstart

Run the end-to-end example pipeline:

```bash
python3 examples/tac_eval.py
```

This runs:

1. generation
2. uncertainty scoring
3. mapper training
4. held-out evaluation

## Custom Score-Label CSV Input

You can also run the pipeline on a custom CSV containing:

- first column: uncertainty scores
- second column: binary labels

Example:

```bash
python3 examples/tac_eval.py \
  --datasets my_dataset \
  --models custom \
  --custom_scores_csv data/my_dataset.csv \
  --custom_method_name my_score \
  --higher_worse true
```

Note: custom CSV labels are assumed to use `1 = correct` and `0 = incorrect`.

## Outputs

The example pipeline writes results under:

```text
outputs/<dataset>/<sanitized-model>/
```

Including:

- `uncertainty_scores.npz`
- `mapper_eval/results.csv`
- `mapper_eval/scores/*.npz`
- optional comparison and calibration plots

## Notebook Walkthrough

A step-by-step notebook example is available at:

```text
examples/tac_eval_walkthrough.ipynb
```
