Metadata-Version: 2.4
Name: obsideo-cloud
Version: 0.1.0
Summary: CLI for capturing and restoring ML training artifacts with encrypted cloud storage
License: MIT
Project-URL: Homepage, https://obsideo.io/mlvault
Project-URL: Bug Tracker, https://github.com/Regan-Milne/ml-training-vault/issues
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: click>=8.0
Requires-Dist: cryptography>=41.0
Requires-Dist: requests>=2.28

# mlvault

**Immutable, encrypted snapshots of ML training runs.**

mlvault captures your model checkpoints, metrics, and logs after every training run — encrypted and stored in the cloud. Restore any run on any machine with a single command.

```
pip install mlvault mlvault-mlflow
mlvault init
```

## Quick start

### 1. Install

```bash
pip install mlvault mlvault-mlflow
```

### 2. Configure your API key

Get a key at [obsideo.io/mlvault-beta](https://obsideo.io/mlvault-beta), then:

```bash
mlvault init
# Paste your mlvault API key: mlv1_...
# Validating key... ok
# You're all set.
```

### 3. Run and archive a training job

```bash
mlvault run --collect ./outputs python train.py
```

mlvault runs your script, then encrypts and uploads everything in `./outputs` to cloud storage.

```
Run ID : myproject_20240307_143022_abc123
Collect: /home/user/project/outputs
Command: python train.py

-- Training started --
...
-- Collecting artifacts --
  3 artifact(s) -> myproject_20240307_143022_abc123.tar.gz
-- Encrypting --
  myproject_20240307_143022_abc123.tar.gz.enc (258.4 KB)
-- Uploading --

Done. Run archived: myproject_20240307_143022_abc123
Remote : myproject_20240307_143022_abc123_bundle.enc
```

### 4. Restore a run

```bash
mlvault restore myproject_20240307_143022_abc123
```

Works on any machine where `mlvault init` has been run with the same API key.

### 5. List runs

```bash
mlvault list
```

---

## MLflow integration

mlvault integrates with MLflow as a custom artifact store. Artifacts logged via `mlf.log_artifact()` are staged locally, then pushed to cloud storage with `mlvault commit`.

### Setup

```python
import mlflow

# Set the artifact store once when creating an experiment
mlflow.create_experiment("my-experiment", artifact_location="mlvault://my-project")
mlflow.set_experiment("my-experiment")
```

Or set it via environment variable before training:

```bash
export MLFLOW_ARTIFACT_URI=mlvault://my-project
```

### Training loop

```python
import mlflow

with mlflow.start_run() as run:
    mlflow.log_param("lr", 0.001)
    mlflow.log_metric("loss", 0.42)
    mlflow.log_artifact("checkpoint.pt")

    run_id = run.info.run_id

# After training, push artifacts to cloud storage:
# mlvault commit <run_id>
print(f"Run ID: {run_id}")
```

### Commit artifacts

```bash
mlvault commit <mlflow_run_id>
```

### Restore MLflow artifacts

```bash
mlvault restore <run_id>
```

---

## Commands

| Command | Description |
|---|---|
| `mlvault init` | Configure API key |
| `mlvault run [cmd]` | Run training command and archive artifacts |
| `mlvault commit <mlflow_run_id>` | Push staged MLflow artifacts to storage |
| `mlvault restore <run_id>` | Download and decrypt a stored run |
| `mlvault list` | List locally tracked runs |
| `mlvault log <run_id>` | Show run details |

---

## Security

Artifacts are encrypted with AES-256-GCM before leaving your machine. Encryption keys are derived from your API key — only you can decrypt your artifacts.

---

## License

MIT
