Metadata-Version: 2.4
Name: transfer-llm
Version: 0.1.0
Summary: A modular PyTorch framework for fine-tuning Hugging Face models.
Author-email: Dedy <dedy.ariansyah@gmail.com>
Project-URL: Homepage, https://github.com/deduu/transfer
Project-URL: Issues, https://github.com/deduu/transfer/issues
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21.6
Requires-Dist: accelerate>=1.0.0
Requires-Dist: bitsandbytes>=0.44.0
Requires-Dist: transformers
Requires-Dist: datasets
Requires-Dist: tqdm
Requires-Dist: pyyaml
Requires-Dist: peft
Requires-Dist: sentence-transformers
Requires-Dist: scikit-learn
Requires-Dist: scipy
Requires-Dist: hf-xet>=1.2.0
Requires-Dist: torch>=2.3.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Dynamic: license-file

# Transfer

A modular PyTorch framework for fine-tuning Hugging Face language models.

Transfer simplifies applying techniques like Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO) to any Hugging Face causal language model.

## Features

- **SFT and DPO** — two fine-tuning strategies out of the box
- **Multi-turn conversation support** — point to a `messages_column` of chat-style message lists
- **Completion-only loss masking** — train on assistant tokens only with `train_on_completions_only`
- **LoRA / QLoRA** — 4-bit quantization (nf4 / fp4) via bitsandbytes, configurable LoRA rank and target modules
- **Training loop controls** — gradient accumulation, LR scheduling (cosine, linear, constant, …), gradient clipping
- **Checkpointing and logging** — intermediate checkpoints every N steps, CSV training logs, optional W&B integration
- **Built-in evaluation** — perplexity and semantic entropy metrics
- **CLI** — `transfer train` and `transfer infer` commands for training and inference without writing Python

## Installation

### From PyPI

```bash
pip install transfer-llm
```

### From source

```bash
git clone https://github.com/deduu/transfer.git
cd transfer
pip install .
```

## Quick Start

### Single-turn SFT

```python
from datasets import Dataset
from transfer import Trainer, SFTConfig

# Prepare data
dataset = Dataset.from_dict({
    "prompt": ["What is the capital of France?", "Explain gravity."],
    "response": ["The capital of France is Paris.", "Gravity is ..."],
})

# Configure
config = SFTConfig(
    model_name="google/gemma-2b",
    num_epochs=3,
    batch_size=2,
    learning_rate=5e-5,
    output_dir="./gemma-sft",
)

# Train
trainer = Trainer(task="sft", config=config, train_dataset=dataset)
trainer.train()
trainer.save_model()
```

### Multi-turn SFT with completion masking and LoRA

```python
from datasets import Dataset
from transfer import Trainer, SFTConfig

# Each row contains a list of message dicts
conversations = [
    {"messages": [
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "What is the weather in SF?"},
        {"role": "assistant", "content": "It is 62 F and foggy."},
    ]},
    # ... more conversations
]
dataset = Dataset.from_list(conversations)

config = SFTConfig(
    model_name="meta-llama/Llama-3.2-3B-Instruct",
    messages_column="messages",           # read from chat-style column
    train_on_completions_only=True,       # loss on assistant tokens only
    num_epochs=3,
    batch_size=1,
    learning_rate=2e-5,
    max_length=1024,
    gradient_accumulation_steps=4,
    warmup_steps=10,
    scheduler_type="cosine",
    save_steps=50,
    logging_steps=5,
    use_lora=True,
    lora_r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    output_dir="./agent-sft-output",
)

trainer = Trainer(task="sft", config=config, train_dataset=dataset)
trainer.train()
trainer.save_model()
```

### DPO

```python
from datasets import load_dataset
from transfer import Trainer, DPOConfig

dataset = load_dataset("Anthropic/hh-rlhf", split="train[:1%]")

config = DPOConfig(
    model_name="google/gemma-2b",
    num_epochs=1,
    batch_size=2,
    beta=0.1,
    output_dir="./gemma-dpo",
)

trainer = Trainer(task="dpo", config=config, train_dataset=dataset)
trainer.train()
trainer.save_model()
```

## CLI Usage

The `transfer` command is installed automatically with the package.

### Training

```bash
transfer train \
  --task sft \
  --model_name google/gemma-2b \
  --dataset_path data.jsonl \
  --output_dir ./my-sft-model \
  --num_epochs 3 \
  --batch_size 4 \
  --learning_rate 2e-4 \
  --use_lora \
  --gradient_accumulation_steps 4 \
  --scheduler_type cosine \
  --warmup_steps 50 \
  --save_steps 200 \
  --logging_steps 10
```

Use `--messages_column messages --train_on_completions_only` for multi-turn datasets.

### Inference

```bash
transfer infer \
  --model_name google/gemma-2b \
  --adapter_path ./my-sft-model \
  --prompt "Explain quantum computing" \
  --use_chat_template \
  --max_new_tokens 256 \
  --temperature 0.7 \
  --do_sample
```

## Configuration

### SFTConfig

| Parameter | Default | Description |
|---|---|---|
| `model_name` | `"google/gemma-2b"` | Hugging Face model name or path |
| `num_epochs` | `30` | Number of training epochs |
| `batch_size` | `1` | Training batch size |
| `learning_rate` | `5e-5` | Learning rate |
| `max_length` | `256` | Maximum sequence length |
| `output_dir` | `"./sft_finetuned_model"` | Output directory |
| `prompt_column` | `"prompt"` | Column name for prompts |
| `response_column` | `"response"` | Column name for responses |
| `messages_column` | `None` | Column with list of message dicts (multi-turn) |
| `train_on_completions_only` | `False` | Only compute loss on assistant tokens |
| `system_prompt` | `None` | System prompt for single-turn mode |
| `use_lora` | `False` | Enable LoRA |
| `lora_r` | `8` | LoRA rank |
| `lora_alpha` | `16` | LoRA alpha |
| `lora_dropout` | `0.1` | LoRA dropout |
| `quantize` | `True` | Enable 4-bit quantization |
| `quantization_type` | `"nf4"` | Quantization type (`nf4` or `fp4`) |
| `gradient_accumulation_steps` | `1` | Gradient accumulation steps |
| `warmup_steps` | `0` | LR scheduler warmup steps |
| `scheduler_type` | `"cosine"` | LR scheduler type |
| `max_grad_norm` | `1.0` | Gradient clipping norm (0 to disable) |
| `save_steps` | `0` | Save checkpoint every N steps (0 to disable) |
| `logging_steps` | `10` | Log metrics every N steps |

### DPOConfig

Inherits all parameters from above, plus:

| Parameter | Default | Description |
|---|---|---|
| `beta` | `0.1` | DPO temperature parameter |
| `learning_rate` | `1e-6` | Learning rate (lower default for DPO) |
| `max_length` | `512` | Maximum sequence length |
| `chosen_column` | `"chosen"` | Column name for chosen responses |
| `rejected_column` | `"rejected"` | Column name for rejected responses |

## Development

```bash
git clone https://github.com/deduu/transfer.git
cd transfer
pip install -e ".[dev]"
pytest tests/ -v
```

## License

MIT
