Metadata-Version: 2.4
Name: bilgeai
Version: 0.1.0
Summary: Smart LLM Training Orchestrator
Project-URL: Homepage, https://github.com/bugrabilge/bilge-ai
Project-URL: Repository, https://github.com/bugrabilge/bilge-ai
Project-URL: Issues, https://github.com/bugrabilge/bilge-ai/issues
Project-URL: Changelog, https://github.com/bugrabilge/bilge-ai/blob/main/CHANGELOG.md
Author: BilgeAI Team
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: cli,fine-tuning,llm,optimization,training
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Requires-Dist: jinja2>=3.1
Requires-Dist: litellm>=1.0
Requires-Dist: nvidia-ml-py>=12.0
Requires-Dist: psutil>=5.9
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0
Requires-Dist: typer[all]>=0.9
Provides-Extra: all
Requires-Dist: accelerate>=0.28; extra == 'all'
Requires-Dist: datasets>=2.18; extra == 'all'
Requires-Dist: deepspeed>=0.14; extra == 'all'
Requires-Dist: huggingface-hub>=0.20; extra == 'all'
Requires-Dist: matplotlib>=3.7; extra == 'all'
Requires-Dist: peft>=0.10; extra == 'all'
Requires-Dist: spacy>=3.7; extra == 'all'
Requires-Dist: torch>=2.0; extra == 'all'
Requires-Dist: transformers>=4.40; extra == 'all'
Requires-Dist: trl>=0.8; extra == 'all'
Requires-Dist: wandb>=0.16; extra == 'all'
Provides-Extra: charts
Requires-Dist: matplotlib>=3.7; extra == 'charts'
Provides-Extra: deepspeed
Requires-Dist: deepspeed>=0.14; extra == 'deepspeed'
Provides-Extra: dev
Requires-Dist: black>=24.0; extra == 'dev'
Requires-Dist: hypothesis>=6.0; extra == 'dev'
Requires-Dist: mypy>=1.8; extra == 'dev'
Requires-Dist: pre-commit>=3.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.3; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.0; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.24; extra == 'docs'
Provides-Extra: hub
Requires-Dist: huggingface-hub>=0.20; extra == 'hub'
Provides-Extra: ml
Requires-Dist: accelerate>=0.28; extra == 'ml'
Requires-Dist: datasets>=2.18; extra == 'ml'
Requires-Dist: peft>=0.10; extra == 'ml'
Requires-Dist: torch>=2.0; extra == 'ml'
Requires-Dist: transformers>=4.40; extra == 'ml'
Requires-Dist: trl>=0.8; extra == 'ml'
Provides-Extra: ner
Requires-Dist: spacy>=3.7; extra == 'ner'
Provides-Extra: tracking
Requires-Dist: wandb>=0.16; extra == 'tracking'
Description-Content-Type: text/markdown

<p align="center">
  <img src="BilgeAI.png" alt="BilgeAI Logo" width="600">
</p>

<p align="center">
  <img src="https://img.shields.io/badge/python-3.10%2B-blue?style=for-the-badge&logo=python&logoColor=white" alt="Python 3.10+">
  <img src="https://img.shields.io/badge/license-Apache%202.0-green?style=for-the-badge" alt="License">
  <img src="https://img.shields.io/badge/tests-308%20passing-brightgreen?style=for-the-badge" alt="Tests">
  <img src="https://img.shields.io/badge/CLI%20commands-12-blueviolet?style=for-the-badge" alt="CLI Commands">
  <img src="https://img.shields.io/badge/GPU-NVIDIA%20%7C%20AMD%20%7C%20Intel%20%7C%20Apple-orange?style=for-the-badge" alt="GPU Support">
  <a href="https://buymeacoffee.com/bilgeai"><img src="https://img.shields.io/badge/Buy%20Me%20a%20Coffee-ffdd00?style=for-the-badge&logo=buy-me-a-coffee&logoColor=black" alt="Buy Me a Coffee"></a>
</p>

# BilgeAI

**Smart LLM Training Orchestrator**

> Read your hardware. Understand your data. Decide. Explain. Train.

BilgeAI analyzes your environment, dataset, and tokenizer to automatically generate the most optimal fine-tuning configuration and production-ready training code. Every decision is transparently explained — no black box.

```
pip install bilgeai
bilge init && bilge profile && bilge analyze --data ./train.jsonl && bilge run
```

---

## The Problem

Fine-tuning an LLM means juggling dozens of hyperparameters: quantization, batch size, learning rate, LoRA rank, sequence length, distributed strategy... Get one wrong and you waste hours on OOM errors or suboptimal training.

Existing tools (Unsloth, Axolotl, AutoTrain) expect **you** to figure all this out. BilgeAI is the only tool that **closes the full loop**:

```
Hardware Profiling -> Data Analysis -> Config Optimization -> Code Generation -> Training -> Experiment Tracking
```

## Why BilgeAI?

| Capability | BilgeAI | Axolotl | Unsloth | AutoTrain |
|-----------|---------|---------|---------|-----------|
| Auto hardware profiling | **Yes** | No | No | No |
| Auto dataset analysis + PII scan | **Yes** | Limited | No | Limited |
| Decision explanation (why each param?) | **Yes** | No | No | No |
| LLM-powered optimization | **Yes** | No | No | No |
| Multi-GPU auto-config (DDP/FSDP/DeepSpeed) | **Yes** | Manual | No | Limited |
| Experiment memory + diff + replay | **Yes** | No | No | No |
| Academic export (LaTeX) | **Yes** | No | No | No |
| Plugin architecture | **Yes** | Limited | No | No |
| Self-update | **Yes** | No | No | No |
| Apple Silicon (MPS) support | **Yes** | Limited | Yes | No |

---

## Quick Start

```bash
# Install core (no PyTorch needed for profiling/analysis)
pip install bilgeai

# Initialize project - creates bilge.yaml + .bilge/ directory
bilge init

# Profile your hardware (CPU, RAM, GPU, CUDA, disk)
bilge profile

# Analyze your dataset (format detection, token stats, quality, PII)
bilge analyze --data ./train.jsonl

# Pre-flight checks (8 validation checks)
bilge doctor

# Generate optimized training script (dry-run by default)
bilge run

# See what BilgeAI decided and WHY (full LLM explanations after bilge run)
bilge explain

# Actually start training
bilge run --start
```

### Full Training Lifecycle

```bash
# After training, review your experiments
bilge history                         # List all runs
bilge diff run_20240301_1 run_20240301_2  # Compare two runs
bilge export --format latex -o report.tex  # Export for paper
bilge replay run_20240301_1 --start   # Reproduce exact run
bilge chart --metric loss             # Plot training curves
bilge update                          # Self-update to latest
```

---

## 12 CLI Commands

| Command | Description |
|---------|-------------|
| `bilge init` | Interactive project setup, creates `bilge.yaml` + `.bilge/` |
| `bilge profile` | Hardware profiling — CPU, RAM, GPU (NVIDIA/AMD/Intel/Apple), CUDA |
| `bilge analyze` | Dataset analysis — format detection, token stats, quality checks, PII scanning |
| `bilge doctor` | 8 pre-flight validation checks with pass/warn/fail status |
| `bilge explain` | Explain every config decision with confidence scores and alternatives |
| `bilge run` | Generate + execute training scripts (SFT/DPO/RLHF), dry-run default |
| `bilge history` | List past training runs with filtering (status, model, limit, JSON) |
| `bilge diff` | Compare two runs side by side (config + results diff) |
| `bilge export` | Export run report as Markdown, JSON, or LaTeX (to stdout or file) |
| `bilge replay` | Replay a previous run using stored config (full reproducibility) |
| `bilge chart` | Generate loss/metric charts (matplotlib PNG or ASCII fallback) |
| `bilge update` | Self-update via PyPI, `--check` for dry check, `--force` for reinstall |

---

## How It Works

### LLM-First Optimization with Hardware Guardrails

```
                     bilge run
                         |
                         v
               GuardrailEngine (3 safety rules)
               - VRAMQuantizationRule
               - VRAMBatchSizeRule
               - DistributedStrategyRule
                         |
                         v
               HardwareConstraints
               (max_batch_size, quantization, ...)
                         |
                         v
               LLMOptimizer.optimize()
               - Constraints + HW + Dataset + Overrides -> LLM
               - Returns full TrainingConfig (JSON)
               - Post-validation: clamp to guardrail bounds
                         |
                         v
               CodeGenerator (Jinja2 template)
               - File type detection (parquet/csv/json)
               - data_files vs data_dir handling
                         |
                         v
               LLM Script Review (mandatory)
               - Post-processing safety fixes
               - ast.parse() validation
                         |
                         v
               train.py + requirements.txt
```

### 4 LLM Providers

| Provider | Default Model | Notes |
|----------|--------------|-------|
| **Anthropic** | claude-sonnet-4-20250514 | Balanced quality/cost |
| **OpenAI** | gpt-4.1 | Powerful, fast |
| **Google** | gemini-2.5-flash | Fast, affordable |
| **Local** | Ollama/vLLM/LM Studio | Free, private, offline |

### Safety Guarantees

- **Guardrail clamping**: LLM output is always validated against hardware constraints
- **Post-processing fixes**: Regex-based correction of LLM-generated code (file types, parameters)
- **ast.parse() validation**: Generated scripts are syntax-checked before saving
- **LLM can never produce a configuration that causes OOM**

---

## Example Output

### `bilge profile`

```
Hardware Profile
+-----------+-------------------------------+
| Component | Details                       |
+-----------+-------------------------------+
| CPU       | AMD Ryzen 7 5800X (8 cores)   |
| RAM       | 32 GB (24 GB available)       |
| GPU       | NVIDIA RTX 4060 Laptop (8 GB) |
| CUDA      | 12.6                          |
| PyTorch   | 2.6.0                         |
| Disk      | 120 GB free                   |
+-----------+-------------------------------+
```

### `bilge explain`

```
Configuration Decisions

1. batch_size = 1, gradient_accumulation = 32
   Reason: VRAM 8 GB. Model ~4.0 GB (4bit), LoRA ~1.5 GB.
   Effective batch = 32. [LLM, confidence: 85%]

2. quantization = "4bit"
   Reason: GPU VRAM (8 GB) < 12.0 GB. 4-bit required.
   [Rule Engine, confidence: 90%]

3. num_epochs = 1
   Reason: Dataset has 1.2M rows - multiple epochs risks overfitting.
   [LLM, confidence: 95%]

4. max_seq_length = 4096
   Reason: Dataset P90 is 3949 tokens. 4096 captures most samples.
   [LLM, confidence: 85%]

5. packing = true
   Reason: Average tokens (2006) < max_seq_length * 0.6.
   Packing improves efficiency by ~2x. [LLM, confidence: 95%]
```

### `bilge chart` (ASCII fallback)

```
  loss curve (150 steps)
  2.3410 |
           |####                                              |
           |########                                          |
           |############                                      |
           |################                                  |
           |#####################                             |
           |###########################                       |
           |##################################                |
           |#########################################         |
  0.4521 |################################################# |
           +--------------------------------------------------+
           0                                               150
```

---

## Supported Hardware

| Platform | Detection | Training |
|----------|-----------|----------|
| **NVIDIA** (CUDA) | `nvidia-ml-py` (no torch needed) + `torch.cuda` fallback | Full support |
| **AMD** (ROCm) | `torch` ROCm backend | Full support |
| **Intel** (XPU) | `torch.xpu` + intel-extension-for-pytorch | Full support |
| **Apple Silicon** (MPS) | `torch.backends.mps` + chip detection (M1-M4) | Full support |
| **CPU only** | `psutil` | Supported (small models) |
| **Multi-GPU** | Auto-detection, per-GPU + total VRAM summary | DDP / FSDP / DeepSpeed |
| **Multi-Node** | `torch.distributed.run` with rendezvous | Full support |

---

## Training Templates

BilgeAI generates production-ready training scripts using Jinja2 templates:

| Template | Method | Features |
|----------|--------|----------|
| **SFT + LoRA** | Supervised Fine-Tuning | QLoRA 4bit/8bit, gradient checkpointing, DDP/FSDP/DeepSpeed |
| **DPO** | Direct Preference Optimization | Reference model setup, LoRA + quantization |
| **RLHF** | PPO via TRL | Value head, reward model integration, LoRA support |

All templates include:
- Full reproducibility block (random/numpy/torch/cudnn/mps seed setup)
- Alpaca and ShareGPT format support
- Dataset file type auto-detection (parquet, csv, json)
- Directory dataset support (data_dir parameter)
- Multi-GPU/node support (torchrun command generation)
- Experiment tracking (Wandb/MLflow/TensorBoard)
- Auto-formatting with `black` (if installed)

---

## Experiment Memory

Every `bilge run` automatically saves a record to a local SQLite database:

```bash
# List past experiments
bilge history
bilge history --status completed --model llama --limit 10 --json

# Compare two runs
bilge diff run_20240301_143022_abc123 run_20240302_091544_def456

# Export for publication
bilge export --format markdown
bilge export --format latex -o results.tex
bilge export --format json -o results.json

# Reproduce any previous run exactly
bilge replay run_20240301_143022_abc123 --start

# Visualize training metrics
bilge chart --metric loss --output loss_curve.png
bilge chart --metric learning_rate  # ASCII fallback if no matplotlib
```

### What's Stored Per Run

```
run_id, timestamp, config_hash, model_name, status
config_json (full snapshot), hardware_json, env_json (pip freeze)
final_loss, best_loss, total_steps, training_time_seconds
output_dir, notes, tags
```

---

## Data Analysis & Privacy

### Format Detection
Automatically detects: **Alpaca** (instruction/output), **ShareGPT** (conversations), CSV, Parquet, custom

### Quality Checks
- Empty/blank rows
- Duplicate detection
- Encoding issues (null bytes, replacement chars)

### PII Scanner (regex-based)
- Email addresses
- Turkish phone numbers (+90)
- International phone numbers
- TC Kimlik numbers (11-digit)
- Credit card numbers

### NER-based PII Scanner (optional)
- spaCy NER or Transformers NER (dslim/bert-base-NER)
- Detects: PERSON, ORG, GPE, LOC entities
- Install: `pip install bilgeai[ner]`

---

## Configuration

```yaml
# bilge.yaml
schema_version: "0.1.0"

model:
  name: "meta-llama/Llama-3.1-8B"

data:
  path: "./train.jsonl"
  format: auto  # auto-detect: alpaca, sharegpt, custom

training:
  # Override any auto-detected value (optional)
  # batch_size: 4
  # learning_rate: 2e-4
  # num_epochs: 3
  # lora_rank: 16
  # quantization: "4bit"
  # num_nodes: 2          # multi-node training
  # report_to: wandb      # experiment tracking

llm:
  provider: anthropic        # anthropic | openai | google | local
  model: claude-sonnet-4-20250514
  api_key_env: ANTHROPIC_API_KEY
  # base_url: null           # Only for local provider

privacy:
  scan_pii: true           # Scan dataset for PII
```

**Override priority:** User overrides > LLM suggestions > Guardrail Rules > Defaults

---

## Installation

### Core (lightweight, no PyTorch)

```bash
pip install bilgeai
```

### With specific extras

```bash
pip install bilgeai[ml]        # PyTorch + Transformers + PEFT + TRL
pip install bilgeai[hub]       # HuggingFace Hub integration
pip install bilgeai[charts]    # matplotlib for chart export
pip install bilgeai[tracking]  # Wandb experiment tracking
pip install bilgeai[deepspeed] # DeepSpeed ZeRO support
pip install bilgeai[ner]       # NER-based PII scanning (spaCy)
pip install bilgeai[all]       # Everything
```

### Development

```bash
git clone https://github.com/bugrabilge/bilge-ai.git
cd bilge-ai
pip install -e ".[dev]"
pytest  # 308 tests
```

---

## Plugin System

Extend BilgeAI with custom plugins via 10 hook points:

```python
from bilgeai.plugins.base import BilgePlugin, HookType

class MyPlugin(BilgePlugin):
    @property
    def name(self) -> str:
        return "my-plugin"

    @property
    def version(self) -> str:
        return "1.0.0"

    def get_hooks(self) -> dict:
        return {
            HookType.POST_TRAIN: self.on_train_complete,
            HookType.POST_ANALYZE: self.on_analysis_done,
        }

    def on_train_complete(self, **kwargs):
        print(f"Training finished!")

    def on_analysis_done(self, **kwargs):
        print(f"Dataset analysis complete.")
```

Register via entry points in `pyproject.toml`:
```toml
[project.entry-points."bilgeai.plugins"]
my-plugin = "my_package:MyPlugin"
```

### Available Hooks

| Hook | When |
|------|------|
| `PRE_PROFILE` / `POST_PROFILE` | Before/after hardware profiling |
| `PRE_ANALYZE` / `POST_ANALYZE` | Before/after dataset analysis |
| `PRE_OPTIMIZE` / `POST_OPTIMIZE` | Before/after config optimization |
| `PRE_TRAIN` / `POST_TRAIN` | Before/after training execution |
| `PRE_GENERATE` / `POST_GENERATE` | Before/after code generation |

---

## Architecture

```
src/bilgeai/
  cli/           # Typer CLI layer (12 commands)
  core/          # Config, constants, exceptions, logging, telemetry
  profiler/      # Hardware detection (CPU/RAM/GPU/CUDA)
  analyzer/      # Dataset analysis, quality, privacy, NER, LLM evaluator
  optimizer/     # GuardrailEngine + LLMOptimizer = HybridEngine, explainer
  generator/     # Jinja2 code generation (SFT/DPO/RLHF templates)
  llm/           # LLM client (litellm), router, rate limiter,
                 # model selector, error analyzer, cost tracker
  hub/           # HuggingFace Hub client, pusher, model card generator
  memory/        # SQLite experiment store (ExperimentStore + RunRecord)
  validation/    # Pre-flight checks, VRAM estimator, compatibility
  plugins/       # Plugin base class, hook system, plugin manager

tests/
  unit/          # Unit test files
  integration/   # E2E CLI pipeline tests
  conftest.py    # Shared fixtures
```

**308 tests** | **ruff clean**

---

## Telemetry

BilgeAI includes **opt-in** anonymous telemetry that is **disabled by default**.

```bash
# Enable (optional)
export BILGE_TELEMETRY=1
```

When enabled, only collects: command name, OS, Python version, GPU vendor, BilgeAI version, random session UUID. **No PII, no model names, no dataset info.** Data is stored locally in `.bilge/telemetry.json` — nothing is sent remotely.

---

## Requirements

- **Python** 3.10+
- **GPU** recommended (NVIDIA, AMD ROCm, Intel XPU, or Apple Silicon) but not required
- **PyTorch** 2.0+ (only needed for training, not for profiling/analysis)
- **LLM API key** required for `bilge run` (Anthropic, OpenAI, Google, or local endpoint)
- **Disk** ~500MB for core install, more for ML dependencies

---

## Roadmap

- [x] **v0.1.0** — Full release: Profiler, Analyzer, LLM-first HybridEngine, 12 CLI commands, 4 providers, 308 tests
- [ ] **Next** — Community run history (anonymous config sharing)
- [ ] **Next** — Recommendation engine ("best result with similar hardware")

---

## Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for details.

```bash
git clone https://github.com/bugrabilge/bilge-ai.git
cd bilge-ai
pip install -e ".[dev]"
pytest                # Run all 308 tests
ruff check src/       # Lint
```

---

## License

Apache 2.0 — See [LICENSE](LICENSE) for details.

---

## Acknowledgments

BilgeAI is built on the shoulders of giants:

- [HuggingFace Transformers](https://github.com/huggingface/transformers) / [PEFT](https://github.com/huggingface/peft) / [TRL](https://github.com/huggingface/trl)
- [Typer](https://github.com/tiangolo/typer) + [Rich](https://github.com/Textualize/rich)
- [litellm](https://github.com/BerriAI/litellm)
- [DeepSpeed](https://github.com/microsoft/DeepSpeed)

---

<p align="center">
  <strong>BilgeAI</strong> — Stop guessing hyperparameters. Let the machine decide, and understand why.
</p>
