Metadata-Version: 2.1
Name: html-login-field-detector
Version: 0.1.0
Summary: A library for detecting login fields in HTML using DistilBERT.
Author-email: Victor Delaplaine <vdelaplainess@gmail.com>
License: MIT
Project-URL: homepage, https://github.com/ByVictorrr
Project-URL: repository, https://github.com/ByVictorrr/LoginFieldDetector
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.10
Description-Content-Type: text/markdown

Yes, there are several ways to speed up training on your Arch Linux box by optimizing the utilization of your system's resources and fine-tuning your training setup. Here are the key strategies:

---

### **1. Optimize Resource Utilization**

#### **Use Your GPU Fully**
1. **Ensure CUDA and cuDNN Are Installed:**
   - Install the NVIDIA proprietary drivers:
     ```bash
     sudo pacman -S nvidia nvidia-utils
     ```
   - Install CUDA:
     ```bash
     sudo pacman -S cuda
     ```
   - Install cuDNN:
     ```bash
     sudo pacman -S cudnn
     ```

2. **Check GPU Utilization**:
   - Use `nvidia-smi` to monitor GPU usage and ensure the training job utilizes the GPU:
     ```bash
     nvidia-smi
     ```

3. **Mixed Precision Training (FP16)**:
   - Enable mixed precision to speed up computations while reducing memory usage:
     ```python
     from transformers import TrainingArguments

     training_args = TrainingArguments(
         output_dir="./results",
         per_device_train_batch_size=16,
         num_train_epochs=3,
         fp16=True,  # Enables mixed precision
     )
     ```

4. **Pin GPU Memory**:
   - Add this line to prevent other processes from taking GPU memory:
     ```python
     torch.backends.cudnn.benchmark = True
     ```

---

### **2. Optimize Training Setup**

#### **Increase Batch Size**
- Larger batches process more samples in parallel, reducing the number of steps.
  ```python
  TrainingArguments(per_device_train_batch_size=32)
  ```

#### **Gradient Accumulation**
- If GPU memory limits the batch size, use gradient accumulation to simulate larger batches:
  ```python
  TrainingArguments(gradient_accumulation_steps=2)
  ```

#### **Use Smaller Models**
- If your workload doesn't require a large model, use smaller architectures (e.g., `distilbert` instead of `bert`).

#### **Reduce Sequence Length**
- Truncate sequences to reduce unnecessary computation:
  ```python
  tokenizer(max_length=128, truncation=True, padding=True)
  ```

#### **Use Faster Optimizers**
- Optimizers like `AdamW` with proper configuration can reduce overhead.

---

### **3. Parallelize and Utilize All Resources**

#### **Utilize CPU Cores**
- If the data loader is slow, ensure it uses all CPU cores:
  ```python
  DataLoader(dataset, num_workers=<number_of_cores>)
  ```

#### **Use Distributed Data Parallel (DDP)**
- If you have multiple GPUs, use PyTorch DistributedDataParallel:
  ```bash
  python -m torch.distributed.launch --nproc_per_node=2 script.py
  ```

#### **Data Preprocessing Optimization**
- Preprocess data offline and cache it to avoid bottlenecks during training.
  ```python
  dataset = dataset.map(preprocess_function, cache_file_name="cached_data.arrow")
  ```

---

### **4. Optimize Disk I/O**

#### **Use SSDs**
- Store your dataset and training outputs on SSDs to reduce I/O latency.

#### **Prefetch Data**
- Use a data loader with prefetching to minimize data transfer delays:
  ```python
  DataLoader(dataset, prefetch_factor=2, pin_memory=True)
  ```

---

### **5. Adjust Training Configuration**

#### **Reduce Epochs**
- Shorten training epochs if your dataset is small or if the model converges early:
  ```python
  TrainingArguments(num_train_epochs=3)
  ```

#### **Reduce Evaluation Frequency**
- Evaluate less frequently to save time:
  ```python
  TrainingArguments(evaluation_strategy="epoch")
  ```

#### **Use Checkpoint Resumption**
- If training crashes, resume from the last checkpoint instead of restarting:
  ```bash
  trainer.train(resume_from_checkpoint=True)
  ```

---

### **6. System Optimizations on Arch Linux**

#### **Enable High-Performance Mode**
- Switch to a high-performance CPU governor:
  ```bash
  sudo cpupower frequency-set -g performance
  ```

#### **Monitor and Kill Resource-Heavy Background Processes**
- Use `htop` to identify and kill unnecessary processes:
  ```bash
  htop
  ```

#### **Ensure Enough Swap Space**
- If memory is limited, increase swap space:
  ```bash
  sudo fallocate -l 4G /swapfile
  sudo chmod 600 /swapfile
  sudo mkswap /swapfile
  sudo swapon /swapfile
  ```

---

### **7. Alternative Lightweight Training Frameworks**
- Use efficient libraries for faster training:
  - **DeepSpeed**: Optimizes large-scale training.
  - **Accelerate**: Hugging Face's library for fast and simple distributed training.

Install and configure:
```bash
pip install deepspeed
```

Use with Hugging Face:
```python
TrainingArguments(deepspeed="ds_config.json")
```

---

### **torch.compile(model)*
- Use PyTorch’s profiler to identify bottlenecks:
  ```python
    self.model = self.model.to(self.device)
    self.model = torch.compile(self.model)
    log.info("Model successfully compiled with torch.compile.")
  ```
#### Install prerequisites
```bash
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

```

---

### Summary of Quick Wins
1. **Ensure GPU is fully utilized** (use `nvidia-smi` to monitor).
2. **Enable mixed precision training** (`fp16=True`).
3. **Increase batch size** or use **gradient accumulation**.
4. **Optimize system performance** (high-performance mode, SSDs).
5. **Reduce evaluation frequency** and sequence length.

Let me know if you’d like help implementing any of these optimizations!

