Metadata-Version: 2.4
Name: retrain
Version: 0.2.1
Summary: RLVR training framework for LLMs
Requires-Python: >=3.11
Requires-Dist: datasets>=2.20.0
Requires-Dist: peft>=0.12.0
Requires-Dist: requests>=2.31.0
Requires-Dist: safetensors>=0.4.0
Requires-Dist: transformers>=4.44.0
Provides-Extra: all
Requires-Dist: mlx-lm>=0.20.0; (platform_system == 'Darwin') and extra == 'all'
Requires-Dist: sentence-transformers>=5.2.3; extra == 'all'
Requires-Dist: tinker>=0.13.1; extra == 'all'
Requires-Dist: torch>=2.4.0; extra == 'all'
Requires-Dist: verifiers>=0.1.0; extra == 'all'
Requires-Dist: wandb>=0.17.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mlx-lm>=0.20.0; (platform_system == 'Darwin') and extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.9.0; extra == 'dev'
Requires-Dist: sentence-transformers>=5.2.3; extra == 'dev'
Requires-Dist: tinker>=0.13.1; extra == 'dev'
Requires-Dist: torch>=2.4.0; extra == 'dev'
Requires-Dist: ty>=0.0.14; extra == 'dev'
Requires-Dist: verifiers>=0.1.0; extra == 'dev'
Requires-Dist: wandb>=0.17.0; extra == 'dev'
Provides-Extra: local
Requires-Dist: torch>=2.4.0; extra == 'local'
Provides-Extra: mlx
Requires-Dist: mlx-lm>=0.20.0; (platform_system == 'Darwin') and extra == 'mlx'
Provides-Extra: semantic
Requires-Dist: sentence-transformers>=5.2.3; extra == 'semantic'
Provides-Extra: tinker
Requires-Dist: tinker>=0.13.1; extra == 'tinker'
Provides-Extra: verifiers
Requires-Dist: verifiers>=0.1.0; extra == 'verifiers'
Provides-Extra: wandb
Requires-Dist: wandb>=0.17.0; extra == 'wandb'
Description-Content-Type: text/markdown

# retrain

`retrain` is a TOML-first RLVR (Reinforcement Learning with Verifiable Rewards) trainer for LLMs.

If you are new, start with install -> explore commands -> run a tiny config.

## Install

Requires Python 3.11+.

```bash
# CLI + docs exploration
uv tool install retrain

# Local GPU training (adds torch)
uv tool install "retrain[local]"

# Remote Tinker backend
uv tool install "retrain[tinker]"
```

If you are developing this repo directly:

```bash
pip install -e ".[dev]"
```

## Explore the CLI

Use these first to understand what exists before you train:

```bash
retrain --help
retrain man
retrain man --topic quickstart
retrain man --list-topics
retrain backends
retrain doctor
```

Useful inspection commands while iterating:

```bash
retrain explain retrain.toml   # dry-run: what this config would do
retrain status logs            # summarize runs/campaigns under logs/
retrain plugins                # list built-ins + discovered plugins
retrain init-plugin --kind transform --name my_transform --with-test
retrain man --json --topic quickstart
retrain man --path             # editable bundled manual source
```

## Tiny TOML Demo

Create `mini.toml`:

```toml
[model]
model = "Qwen/Qwen3-4B-Instruct-2507"

[algorithm]
advantage_mode = "grpo"
transform_mode = "none"

[training]
max_steps = 20
batch_size = 2
group_size = 8
max_tokens = 1024
lr = 4e-5

[backend]
backend = "local"
adapter_path = "adapters/mini"

[logging]
log_dir = "logs/mini"
```

Run it:

```bash
retrain mini.toml
```

Override fields from CLI without editing TOML:

```bash
retrain mini.toml --seed 42 --max-steps 40 --wandb-project my-project
```

## Quick Start from Template

```bash
retrain init --template quickstart
retrain retrain.toml
```

Other templates:

```bash
retrain init --list
retrain init --template experiment
retrain init --template campaign
retrain init --interactive
```

## Why retrain

- Composable advantage pipeline: GRPO/MaxRL + GTPO/HICRA/SEPA
- Pluggable backends and inference engines
- Pluggable rewards (match, math, judge, custom)
- Campaign sweeps from one TOML
- LoRA-Squeeze rank analysis/compression
- Checkpoint resume and run status tooling

## Common Config Patterns

Use verifiers environments from TOML:

```toml
[environment]
provider = "verifiers"
id = "primeintellect/gsm8k"
args = { split = "train" }
auto_install = true
max_turns = 8
```

Use custom advantage + transform plugins from TOML:

```toml
[algorithm]
advantage_mode = "my_advantages.hipa_like_advantages"
transform_mode = "my_transforms.make_transform_spec"
```

Use a full algorithm plugin (overrides composable advantage+transform path):

```toml
[algorithm]
algorithm_mode = "my_algorithms.my_algorithm"
```

## Documentation

Full docs: [retrain.readthedocs.io](https://retrain.readthedocs.io)

- [Getting Started](https://retrain.readthedocs.io/getting-started/)
- [Configuration Reference](https://retrain.readthedocs.io/configuration/)
- [Advantage Functions](https://retrain.readthedocs.io/advantages/)
- [SEPA Scheduling](https://retrain.readthedocs.io/sepa/)
- [Campaigns](https://retrain.readthedocs.io/campaigns/)
- [LoRA-Squeeze](https://retrain.readthedocs.io/squeeze/)
- [Reward Functions](https://retrain.readthedocs.io/rewards/)
- [Inference Engines](https://retrain.readthedocs.io/inference-engines/)

Contributor note: run `retrain man --check` in CI to detect stale auto-generated manual blocks, and `retrain man --sync` locally to refresh them.
