Metadata-Version: 2.4
Name: grades
Version: 1.0.6
Summary: Gradient-based Early Stopping for Efficient Fine-tuning
Home-page: https://github.com/IXZZZ9/GradES
Maintainer-email: Xi Zeng <zengxi0115@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/IXZZZ9/GradES
Project-URL: Documentation, https://github.com/IXZZZ9/GradES#readme
Project-URL: Repository, https://github.com/IXZZZ9/GradES.git
Project-URL: Bug Tracker, https://github.com/IXZZZ9/GradES/issues
Project-URL: Paper, https://arxiv.org/abs/2509.01842
Keywords: machine learning,deep learning,transformers,early stopping,gradient-based,fine-tuning,llm,pytorch,huggingface,efficiency
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=1.9.0
Requires-Dist: transformers>=4.20.0
Requires-Dist: numpy>=1.21.0
Provides-Extra: wandb
Requires-Dist: wandb>=0.12.0; extra == "wandb"
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=22.0; extra == "dev"
Requires-Dist: isort>=5.0; extra == "dev"
Requires-Dist: flake8>=4.0; extra == "dev"
Requires-Dist: mypy>=0.910; extra == "dev"
Provides-Extra: examples
Requires-Dist: jupyter>=1.0.0; extra == "examples"
Requires-Dist: notebook>=6.0.0; extra == "examples"
Requires-Dist: ipywidgets>=7.0.0; extra == "examples"
Requires-Dist: trl>=0.7.0; extra == "examples"
Requires-Dist: unsloth; extra == "examples"
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# 🎯 GradES: Gradient-based Early Stopping

[![PyPI version](https://badge.fury.io/py/grades.svg)](https://badge.fury.io/py/grades)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![arXiv](https://img.shields.io/badge/arXiv-2509.01842-b31b1b.svg)](https://arxiv.org/abs/2509.01842)

Official implementation of **GradES** - a gradient-based selective training method that dynamically freezes converged modules during fine-tuning to achieve **40-50% computational savings** without sacrificing model performance.

## 📄 Paper
**GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping**
*Qifu Wen, Xi Zeng, Zihan Zhou, Shuaijun Liu, Mehdi Hosseinzadeh, Ningxin Su and Reza Rawassizadeh*
📖 [arXiv:2509.01842](https://arxiv.org/abs/2509.01842)

## 🚀 Quick Installation

### From PyPI
```bash
pip install grades
```

### From Source
```bash
git clone https://github.com/IXZZZ9/GradES.git
cd GradES
pip install -e .
```

### For Development
```bash
git clone https://github.com/IXZZZ9/GradES.git
cd GradES
pip install -e ".[dev,wandb,examples]"
```

## 💡 Quick Start

### Basic Usage with Transformers

```python
from grades import GradEarlyStoppingCallback, GradEarlyStoppingConfig
from transformers import Trainer, TrainingArguments

# Configure GradES
config = GradEarlyStoppingConfig(
    tau=0.023,           # Convergence threshold
    alpha=0.55,          # Minimum training progress before freezing
    enable_wandb_logging=True
)

# Create callback
callback = GradEarlyStoppingCallback(config)

# Use with any Transformers Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    callbacks=[callback]
)

trainer.train()
```

### Integration with Unsloth (Recommended)

GradES seamlessly integrates with [Unsloth](https://github.com/unslothai/unsloth) for ultra-fast LLM fine-tuning:

#### 🔥 LoRA Fine-tuning
```python
from grades import GradEarlyStoppingCallback, GradEarlyStoppingConfig
from trl import SFTTrainer, SFTConfig

# GradES configuration for LoRA
config = GradEarlyStoppingConfig(
    tau=0.021637,
    alpha=0.55,
    enable_wandb_logging=True,
)
callback = GradEarlyStoppingCallback(config)

# Unsloth SFTTrainer with GradES
trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=combined_dataset,
    callbacks=[callback],
    args=SFTConfig(
        dataset_text_field="text",
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_ratio=0.05,
        max_steps=60,
        learning_rate=2e-4,
        logging_steps=1,
        optim="adamw_torch",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        report_to="wandb",
        gradient_checkpointing=True,
        dataloader_pin_memory=True,
        dataloader_num_workers=0,
        remove_unused_columns=False,
    ),
)
```

#### 🚀 Full Fine-tuning (FFT)
```python
from grades import GradEarlyStoppingCallback, GradEarlyStoppingConfig

# GradES configuration for FFT
config = GradEarlyStoppingConfig(
    tau=2.404167,
    alpha=0.55,  # Higher alpha for FFT
    enable_wandb_logging=True,
)
callback = GradEarlyStoppingCallback(config)

# Set full_finetuning=True in FastLanguageModel.from_pretrained
# Remove FastLanguageModel.get_peft_model LoRA setup

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=combined_dataset,
    callbacks=[callback],
    args=SFTConfig(
        dataset_text_field="text",
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_ratio=0.05,
        num_train_epochs=1,
        learning_rate=2e-5,
        logging_steps=1,
        optim="adamw_torch",
        weight_decay=0.01,
        lr_scheduler_type="cosine",
        seed=3407,
        report_to="wandb",
        gradient_checkpointing=True,
        dataloader_pin_memory=True,
        dataloader_num_workers=0,
        remove_unused_columns=False,
    ),
)
```

### Vision-Language Models (VLMs)
```python
from grades import VLMGradEarlyStoppingCallback, VLMGradEarlyStoppingConfig

# Configure for VLMs
vlm_config = VLMGradEarlyStoppingConfig(
    vision_tau=1e-4,
    language_tau=1e-3,
    alpha=0.3,
    enable_wandb_logging=True
)

vlm_callback = VLMGradEarlyStoppingCallback(vlm_config)
```

## 🎯 Try it Now!

### Google Colab Integration

Ready-to-use notebooks with minimal setup:

#### 🦙 LLM Fine-tuning with Unsloth
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb)

**Quick Setup:**
1. Open the [Unsloth Llama3.1 notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb)
2. Add this cell after imports:
   ```python
   !pip install grades
   ```
3. Replace the trainer setup with the GradES examples above
4. Run and enjoy 40-50% faster training! 🚀

#### 🖼️ VLM Fine-tuning (Coming Soon)
- Hugging Face VLM + LoRA notebook
- Hugging Face VLM + FFT notebook

## 📊 Key Results

- ✅ **40-50% computational savings** compared to standard fine-tuning
- ✅ **Maintains or improves** model performance across multiple benchmarks
- ✅ **Tested on**: Qwen3, Phi4, Llama-3.1, and Mistral models (0.6B to 14B parameters)
- ✅ **Compatible with**: LoRA, Full Fine-tuning, and Vision-Language Models
- ✅ **Framework support**: Transformers, TRL, Unsloth

## ⚙️ Configuration Options

### GradEarlyStoppingConfig
```python
config = GradEarlyStoppingConfig(
    tau=1e-4,                    # Convergence threshold
    alpha=0.3,                   # Min training progress before freezing
    max_frozen_ratio=1.0,        # Max fraction of components to freeze
    compute_interval=1,          # Steps between gradient computations
    history_maxlen=1000,         # Gradient history buffer size
    enable_wandb_logging=False,  # WandB logging
    log_interval=10,             # Logging frequency
    save_stats=True,             # Save component statistics
    output_dir="./grades_output" # Output directory
)
```

## 📖 Citation

If you find GradES useful in your research, please cite:

```bibtex
@misc{wen2025gradessignificantlyfastertraining,
      title={GradES: Significantly Faster Training in Transformers with Gradient-Based Early Stopping}, 
      author={Qifu Wen and Xi Zeng and Zihan Zhou and Shuaijun Liu and Mehdi Hosseinzadeh and Ningxin Su and Reza Rawassizadeh},
      year={2025},
      eprint={2509.01842},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2509.01842}, 
}
```
In this research, we utilize nanoVLM, Hugging Face, Unsloth, and PyTorch as core platforms for modeling, dataset handling, and deep learning infrastructure.
Wiedmann, L., Roy Gosthipaty, A., & Marafioti, A. (2025). nanoVLM. GitHub repository. https://github.com/huggingface/nanoVLM

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Le Scao, T., Gugger, S., Drame, M., Lhoest, Q., & Rush, A. M. (2020). Transformers: State-of-the-Art Natural Language Processing. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 38–45. Association for Computational Linguistics. https://www.aclweb.org/anthology/2020.emnlp-demos.6

Han, D., Han, M., & Unsloth Team. (2023). Unsloth. GitHub repository. https://github.com/unslothai/unsloth

Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., Killeen, T., Lin, Z., Gimelshein, N., Antiga, L., Desmaison, A., Kopf, A., Yang, E., DeVito, Z., Raison, M., Tejani, A., Chilamkurthy, S., Steiner, B., Fang, L., Bai, J., & Chintala, S. (2019). PyTorch: An Imperative Style, High-Performance Deep Learning Library. Advances in Neural Information Processing Systems, 32, 8024–8035.

## 📝 License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## 🔗 Links

- 📖 **Paper**: [arXiv:2509.01842](https://arxiv.org/abs/2509.01842)
- 🐙 **GitHub**: [IXZZZ9/GradES](https://github.com/IXZZZ9/GradES)
- 📦 **PyPI**: [grades](https://pypi.org/project/grades/)
- 🤗 **Hugging Face**: [Coming Soon]
- 🦙 **Unsloth Integration**: [Colab Notebook](https://colab.research.google.com/github/unslothai/notebooks/blob/main/nb/Llama3.1_(8B)-Alpaca.ipynb)

---

**Made with ❤️ by the GradES Team**
