Metadata-Version: 2.3
Name: singlora
Version: 0.0.1
Summary: SingLoRA - Pytorch
License: MIT
Keywords: artificial intelligence,deep learning,optimizers,Prompt Engineering,SingLoRA,LLM,Pytorch,Pytorch Lightning
Author: Kye Gomez
Author-email: kye@apac.ai
Requires-Python: >=3.10,<4.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.9
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: torch
Requires-Dist: transformers
Project-URL: Documentation, https://github.com/kyegomez/singlora
Project-URL: Homepage, https://github.com/kyegomez/singlora
Project-URL: Repository, https://github.com/kyegomez/singlora
Description-Content-Type: text/markdown

# SingLoRA: A Minimal Implementation

This repository provides a minimal, single-file implementation of SingLoRA (Single Matrix Low-Rank Adaptation) as described in the paper ["SingLoRA: Low Rank Adaptation Using a Single Matrix"](https://arxiv.org/abs/2507.05566) by Bensaïd et al.

## Overview

SingLoRA is a parameter-efficient fine-tuning method that simplifies the LoRA architecture by using a single trainable matrix instead of two. This implementation demonstrates how to apply SingLoRA to transformer models using PyTorch and the Hugging Face Transformers library.

## Features

- Simple, self-contained implementation in a single Python file
- Compatible with Hugging Face Transformers models
- Includes a working example with DistilBERT
- Demonstrates parameter reduction compared to full fine-tuning

## Installation

```bash
pip install -r requirements.txt
```

## Usage

### Basic Example

Here's a simple example of how to apply SingLoRA to a transformer model:

```python
from singlora import apply_singlora_to_model
from transformers import AutoModelForSequenceClassification

# Load your model
model = AutoModelForSequenceClassification.from_pretrained("distilbert-base-uncased")

# Apply SingLoRA
apply_singlora_to_model(
    model=model,
    rank=8,              # Low-rank dimension (r in the paper)
    alpha=8.0,           # Scaling factor
    ramp_up_steps=1000,  # Steps for ramp-up function u(t)
    target_modules=["q_lin", "k_lin", "v_lin"]  # Target attention layers
)

# Now only the SingLoRA parameters are trainable
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=1e-3
)
```

### Configuration Parameters

- `rank`: The dimension of the low-rank adaptation (r). Lower values mean fewer parameters.
- `alpha`: Scaling factor for the adaptation. Higher values allow larger updates.
- `ramp_up_steps`: Number of steps (T) for the ramp-up function u(t) = min(t/T, 1).
- `target_modules`: List of layer names to apply SingLoRA to. Common targets:
  - `["query", "key", "value"]` for standard transformers
  - `["q_lin", "k_lin", "v_lin"]` for DistilBERT
  - `["q_proj", "k_proj", "v_proj"]` for LLaMA models

### Parameter Efficiency

SingLoRA significantly reduces the number of trainable parameters compared to full fine-tuning:

```python
# Example parameter counts
original_params = sum(p.numel() for p in original_model.parameters() if p.requires_grad)
singlora_params = sum(p.numel() for p in model.parameters() if p.requires_grad)

reduction = 100 * (1 - singlora_params / original_params)
print(f"Parameter reduction: {reduction:.2f}%")
```

For a complete working example, see `example.py` in the repository.

### LLaMA Example

Here's how to apply SingLoRA to LLaMA models:

```python
from singlora import apply_singlora_to_model
from transformers import LlamaForCausalLM, LlamaTokenizer
import torch

# Load LLaMA model and tokenizer
model_name = "meta-llama/Llama-2-7b-hf"  # or your local path
model = LlamaForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,  # Use float16 for efficiency
    device_map="auto"           # Automatically handle model placement
)
tokenizer = LlamaTokenizer.from_pretrained(model_name)

# Apply SingLoRA to attention layers
apply_singlora_to_model(
    model=model,
    rank=16,              # Can use larger rank for bigger models
    alpha=16.0,           # Increased alpha for stronger adaptation
    ramp_up_steps=2000,   # More steps for larger datasets
    target_modules=[      # LLaMA-specific attention layer names
        "q_proj",
        "k_proj",
        "v_proj"
    ]
)

# Example training setup
optimizer = torch.optim.AdamW(
    filter(lambda p: p.requires_grad, model.parameters()),
    lr=1e-4  # Lower learning rate for LLaMA
)

# Example inference
prompt = "Once upon a time"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=100,
        temperature=0.7,
        do_sample=True
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

Key differences for LLaMA models:
- Use `LlamaForCausalLM` instead of standard transformer models
- Target the LLaMA-specific projection layers (`q_proj`, `k_proj`, `v_proj`)
- Consider using `float16` for memory efficiency
- Adjust hyperparameters (`rank`, `alpha`, learning rate) for larger models
- Use `device_map="auto"` for automatic model sharding on multiple GPUs

## Citation

If you use this implementation in your research, please cite the original paper:

```bibtex
@misc{bensaïd2025singloralowrankadaptation,
      title={SingLoRA: Low Rank Adaptation Using a Single Matrix}, 
      author={David Bensaïd and Noam Rotstein and Roy Velich and Daniel Bensaïd and Ron Kimmel},
      year={2025},
      eprint={2507.05566},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2507.05566}, 
}
```

## License

This project is licensed under the MIT License - see the LICENSE file for details.

