Metadata-Version: 2.4
Name: llm-refresh-wheel
Version: 0.1.0
Summary: Keep your local LLM fresh without forgetting what it already knows — replay buffers + forgetting metrics + auto-rollback for continual fine-tuning
Project-URL: Homepage, https://github.com/cleonard2341/llm-refresh-wheel
Project-URL: Repository, https://github.com/cleonard2341/llm-refresh-wheel
Project-URL: Issues, https://github.com/cleonard2341/llm-refresh-wheel/issues
Author: brody4321
License: MIT
License-File: LICENSE
Keywords: catastrophic-forgetting,continual-learning,fine-tuning,llm,lora,nlp,peft,replay-buffer,transformers
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: pydantic-settings>=2.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rich>=13.0.0
Requires-Dist: toml>=0.10.0
Requires-Dist: typer[all]>=0.9.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=7.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: peft
Requires-Dist: peft>=0.7.0; extra == 'peft'
Requires-Dist: torch>=2.0.0; extra == 'peft'
Requires-Dist: transformers>=4.35.0; extra == 'peft'
Provides-Extra: schedule
Requires-Dist: schedule>=1.2.0; extra == 'schedule'
Provides-Extra: torch
Requires-Dist: torch>=2.0.0; extra == 'torch'
Provides-Extra: train
Requires-Dist: datasets>=2.14.0; extra == 'train'
Requires-Dist: peft>=0.7.0; extra == 'train'
Requires-Dist: torch>=2.0.0; extra == 'train'
Requires-Dist: transformers>=4.35.0; extra == 'train'
Requires-Dist: trl>=0.7.0; extra == 'train'
Provides-Extra: transformers
Requires-Dist: torch>=2.0.0; extra == 'transformers'
Requires-Dist: transformers>=4.35.0; extra == 'transformers'
Description-Content-Type: text/markdown

# llm-refresh-wheel

**Keep your local LLM fresh without forgetting what it already knows.**

Fine-tune your LLM on new data continuously — without catastrophic forgetting. `llm-refresh-wheel` wraps HuggingFace PEFT/TRL with a smart replay buffer system, concrete forgetting metrics, and auto-rollback when forgetting is detected.

---

## The Problem

When you fine-tune an LLM on new data, it forgets old knowledge. This is called **catastrophic forgetting**. Current tools (PEFT, TRL) give you training — but no safety net.

`llm-refresh-wheel` gives you:
- **Replay buffers** — mix old examples back in during each training cycle
- **Forgetting metrics** — measure exactly how much knowledge was lost (BWT, FWT, KRS)
- **Auto-rollback** — automatically revert training if forgetting exceeds your threshold

```
New Data ──┐
           ├──► Build Training Set ──► Train (LoRA) ──► Eval on Anchor
Replay ────┘                                                  │
Buffer ◄────────────────────────────────────────── Add New ◄──┤
                                                              │
                                                    BWT < threshold?
                                                    └──► Rollback ✓
```

---

## Quick Start

```bash
# Install (no ML deps — CLI only)
pip install llm-refresh-wheel

# Install with training support
pip install "llm-refresh-wheel[train]"

# Initialize config
refresh-wheel init

# Add your anchor evaluation set (JSONL: one {"text": "..."} per line)
refresh-wheel anchor anchor.jsonl

# Add new training data
refresh-wheel add new_data.jsonl

# Run a refresh cycle
refresh-wheel refresh --model microsoft/phi-2

# Check your model's health
refresh-wheel status
```

---

## Forgetting Metrics Explained

| Metric | What it measures | Good value |
|--------|-----------------|------------|
| **BWT** (Backward Transfer) | Did perplexity on old data increase after training? | ≥ 0.0 (no forgetting) |
| **FWT** (Forward Transfer) | Did prior training help on new data? | > 0.0 (positive transfer) |
| **KRS** (Knowledge Retention Score) | Overall knowledge retention [0–100] | ≥ 80 |

### Math

```
BWT_t  = PPL_anchor_before - PPL_anchor_after
         (negative = perplexity went up = forgetting happened)

FWT_t  = PPL_pre_(t-1) - PPL_pre_t
         (positive = prior cycles helped on new tasks)

KRS    = 100 × exp(−λ × Σ|BWT_t| for BWT_t < 0)
         (100 = perfect retention, decays exponentially with cumulative forgetting)
```

---

## Buffer Strategies

| Strategy | How it works | Best for |
|----------|-------------|----------|
| **reservoir** (default) | Vitter's Algorithm R — uniform random sample over all seen data | General use, unknown data distribution |
| **prioritized** | Keeps highest-loss (hardest) examples; evicts easy ones | When hard examples matter most |
| **diverse** | Hash-based bucketing into 64 slots — prevents topic dominance | When data has many distinct topics |

---

## CLI Reference

| Command | Description |
|---------|-------------|
| `refresh-wheel init` | Write default config.toml |
| `refresh-wheel add <file.jsonl>` | Add data to replay buffer |
| `refresh-wheel anchor <file.jsonl>` | Set anchor evaluation dataset |
| `refresh-wheel refresh [--model NAME] [--epochs N] [--dry-run]` | Run one refresh cycle |
| `refresh-wheel status` | Buffer stats + KRS + last refresh time |
| `refresh-wheel metrics` | Full forgetting history as a table |
| `refresh-wheel eval` | Compute perplexity on anchor set |
| `refresh-wheel schedule --every 24` | Start daemon (refreshes every N hours) |
| `refresh-wheel config show` | Pretty-print current config |
| `refresh-wheel config set KEY VALUE` | Dot-notation config update |

### Config Examples

```bash
# Change buffer strategy
refresh-wheel config set buffer.strategy prioritized

# Adjust forgetting threshold
refresh-wheel config set eval.forgetting_threshold -0.2

# Disable auto-rollback
refresh-wheel config set eval.auto_rollback false

# Use a different model
refresh-wheel config set model.name meta-llama/Llama-3.2-1B
```

---

## Python API

```python
from llm_refresh import RefreshWheel, BufferStrategy

# Initialize
rw = RefreshWheel(
    model_name="microsoft/phi-2",
    buffer_strategy=BufferStrategy.RESERVOIR,
    state_path="~/.local/share/llm_refresh/myproject",
)

# Set anchor evaluation set (never changes — measures forgetting)
with open("anchor.jsonl") as f:
    anchor = [json.loads(line) for line in f]
rw.set_anchor(anchor)

# Add new training data
rw.add_data([
    {"text": "New fact: The Eiffel Tower was completed in 1889."},
    {"text": "New fact: Python was created by Guido van Rossum."},
])

# Run a refresh cycle
result = rw.refresh(epochs=1)
print(f"BWT: {result.bwt:.4f}")   # negative = forgetting
print(f"KRS: {result.krs:.1f}")   # 0-100
print(f"Rolled back: {result.rolled_back}")

# Check overall health
status = rw.status()
print(status["metrics"])

# Save state (buffer + history, not model weights)
rw.save("~/.local/share/llm_refresh/myproject")

# Restore later
rw2 = RefreshWheel(model_name="microsoft/phi-2")
rw2.load("~/.local/share/llm_refresh/myproject")
```

### Using Just the Buffer (No GPU Required)

```python
from llm_refresh import create_buffer, BufferStrategy

buf = create_buffer(BufferStrategy.DIVERSE, max_size=10_000, n_buckets=64)
buf.add([{"text": "example one"}, {"text": "example two"}])
samples = buf.sample(100)
print(buf.stats())
buf.save("buffer.json")
```

### Using Just the Metrics Tracker

```python
from llm_refresh import ForgettingTracker
from llm_refresh.models import EvalResult, RefreshResult

tracker = ForgettingTracker(krs_lambda=0.01)

# Record a refresh cycle's results
tracker.record_result(RefreshResult.new(
    examples_trained=500,
    replay_examples=150,
    new_examples=350,
    pre_eval=EvalResult.now(perplexity=12.3, loss=2.5, dataset_size=200),
    post_eval=EvalResult.now(perplexity=11.8, loss=2.4, dataset_size=200),
    bwt=0.5,    # perplexity improved
    fwt=0.2,
    krs=100.0,
    rolled_back=False,
    rollback_reason="",
))

print(tracker.summary())
# {'cycles': 1, 'bwt': 0.5, 'fwt': 0.0, 'krs': 100.0, ...}
```

---

## Installation Options

```bash
# Core CLI only (no ML deps)
pip install llm-refresh-wheel

# With PyTorch
pip install "llm-refresh-wheel[torch]"

# With Transformers
pip install "llm-refresh-wheel[transformers]"

# With PEFT (LoRA)
pip install "llm-refresh-wheel[peft]"

# Full training stack (torch + transformers + peft + trl + datasets)
pip install "llm-refresh-wheel[train]"

# With scheduler daemon support
pip install "llm-refresh-wheel[schedule]"
```

---

## Configuration

Config file lives at `~/.config/llm_refresh/config.toml`. Run `refresh-wheel init` to create it.

```toml
[model]
name = "microsoft/phi-2"
rank = 16
lora_alpha = 32
lora_dropout = 0.05
target_modules = ["q_proj", "v_proj"]

[buffer]
strategy = "reservoir"
max_size = 10000
min_replay_ratio = 0.3
n_buckets = 64

[training]
batch_size = 4
gradient_accumulation_steps = 4
learning_rate = 0.0002
warmup_ratio = 0.03
max_seq_length = 512

[eval]
anchor_size = 200
forgetting_threshold = -0.1
auto_rollback = true
batch_size = 8
krs_lambda = 0.01

[schedule]
interval_hours = 24.0
```

Override any setting with environment variables using double underscore notation:
```bash
export LLM_REFRESH__MODEL__NAME="meta-llama/Llama-3.2-1B"
export LLM_REFRESH__EVAL__AUTO_ROLLBACK=false
```

---

## License

MIT

---

## Support

If this tool saves you from a catastrophic forgetting disaster, consider buying me a coffee:

- [Ko-fi](https://ko-fi.com/brody4321)
- [Buy Me A Coffee](https://buymeacoffee.com/brody4321)
