Metadata-Version: 2.4
Name: vertical
Version: 0.1.0a2026021701
Summary: Terminal-first live training monitor for Python ML workloads across frameworks.
Author: vertical contributors
License: MIT License
        
        Copyright (c) 2026 Alazer Manakelew
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Keywords: machine-learning,monitoring,pytorch,jax,flax,metrics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: rich>=13.9.4
Provides-Extra: pytorch
Requires-Dist: torch>=2.2; extra == "pytorch"
Provides-Extra: jax
Requires-Dist: jax>=0.4.30; extra == "jax"
Provides-Extra: flax
Requires-Dist: flax>=0.8.0; extra == "flax"
Requires-Dist: jax>=0.4.30; extra == "flax"
Provides-Extra: all
Requires-Dist: torch>=2.2; extra == "all"
Requires-Dist: jax>=0.4.30; extra == "all"
Requires-Dist: flax>=0.8.0; extra == "all"
Dynamic: license-file
Dynamic: requires-python

# vertical

`vertical` is a training-side metrics transport layer built around localhost-only services and SSH reverse tunneling.

## Quick Start (uv)

```bash
# Create and sync an environment from pyproject.toml
uv sync

# Run the demo monitor
uv run vertical-demo

# Or point the standalone terminal viewer at any running endpoint
uv run vertical-tui --endpoint http://127.0.0.1:9100
```

## Install as a library

```bash
pip install vertical
uv pip install vertical

# local editable install
uv pip install -e .
```

Framework extras:

```bash
pip install "vertical[pytorch]"
pip install "vertical[jax]"
pip install "vertical[flax]"
# install all framework extras
pip install "vertical[all]"

# uv equivalents
uv pip install "vertical[pytorch]"
uv pip install "vertical[jax]"
uv pip install "vertical[flax]"
uv pip install "vertical[all]"
```

## Minimal usage

```python
from vertical import TrainingMonitor

with TrainingMonitor(title="My Training Run") as monitor:
    for step in range(1, 101):
        monitor.log(
            step=step,
            epoch=((step - 1) // 20) + 1,
            loss=1 / step,
            learning_rate=1e-3,
            metrics={"accuracy": step / 100},
        )
```

## Framework-first API (JAX + Flax + PyTorch)

Use `vertical.init(...)` to define run defaults once (for example `learning_rate` and `epoch`) and then track any per-step numeric signals such as perplexity, gradient norm, or accuracy.

```python
import vertical
from vertical import HTTPMetricLogger

logger = HTTPMetricLogger("http://127.0.0.1:9100")
run = vertical.init(
    framework="pytorch",
    logger=logger,
    learning_rate=3e-4,
    epoch=1,
    device="cuda",  # falls back to cpu when CUDA is unavailable
)

for _ in range(100):
    # one JSON metric event per forward pass
    run.forward(
        loss=1.0,
        perplexity=20.0,
        grad_norm=0.12,
        training_info={"framework": "pytorch", "phase": "train"},
    )
```

Framework adapters are loaded lazily. If you set `framework="jax"`, only JAX-specific setup code runs.

Framework integrations are split under `vertical.frameworks` and exposed via framework-specific wrappers.

PyTorch users can use the dedicated wrapper and module-aware helper:

```python
import vertical

run = vertical.init(framework="pytorch", logger=logger, device="cuda")

for step, batch in enumerate(loader, start=1):
    loss = train_step(batch)
    run.pytorch.module_step(
        module=model,
        optimizer=optimizer,
        step=step,
        loss=loss,
        metrics={"accuracy": acc},
        grad_norm=grad_norm,
        training_info={"phase": "train"},
    )
```

JAX users can use the dedicated wrapper for forward-pass logging:

```python
import vertical

run = vertical.init(framework="jax", logger=logger, backend="cpu")
run.jax.forward(loss=loss_value, perplexity=perplexity_value, grad_norm=grad_norm_value)
```

`vertical.init(...)` can also bootstrap the reverse tunnel directly, which is useful for Colab and hosted training providers:

```python
import vertical

with vertical.init(
    framework="jax",
    backend="cpu",
    remote={
        "ssh_host": "your-laptop-host",
        "ssh_user": "your-user",
        "run_id": "exp-001",
    },
) as run:
    print("endpoint:", run.remote_url)
    print("token:", run.auth_token)
    print("public key:", run.remote_session.public_key_path)
    run.jax.forward(loss=1.0, perplexity=20.0)
```

You can also configure remote setup from env vars and keep scripts at just `vertical.init(...)`:

```bash
export VERTICAL_SSH_HOST=your-laptop-host
export VERTICAL_SSH_USER=your-user
export VERTICAL_RUN_ID=exp-001
# optional:
export VERTICAL_AUTH_TOKEN=your-static-token
```


Flax users can integrate with `TrainState` directly:

```python
run = vertical.init(framework="flax", logger=logger, backend="gpu")

# inside your train step loop
run.flax.train_state_step(
    state=train_state,
    loss=loss_value,
    metrics={"perplexity": ppl_value},
    grad_norm=grad_norm_value,
)
```
## Reverse SSH Architecture

Training machine:
- Runs a metrics server bound to `127.0.0.1` only.
- Training loop continuously updates the current run state.
- Starts an SSH reverse tunnel to laptop.

Laptop:
- Reads only local forwarded endpoint at `http://127.0.0.1:PORT/metrics`.
- Never connects directly to the training machine.

Tunnel command shape:

```bash
ssh -N -R 127.0.0.1:PORT:127.0.0.1:METRICS_PORT you@your-laptop
```

`vertical` enforces this model with:
- Local-only binding (`127.0.0.1`) for metrics server and reverse bind host.
- SSH keepalive options.
- Key-based auth defaults (`BatchMode=yes`, `PasswordAuthentication=no`).
- Auto-reconnect supervisor if tunnel drops.
- Deterministic `run_id -> remote_port` mapping when `remote_port` is omitted.
- Optional endpoint auth token (`Authorization: Bearer ...`) for tunnel consumers.

## SSH Key Setup (Required)

`vertical.init(..., remote=...)` now auto-generates a local keypair by default if missing:
- private key: `~/.ssh/vertical_ed25519`
- public key: `~/.ssh/vertical_ed25519.pub`

This removes the manual `mkdir/chmod/ssh-keygen` step from training scripts.

What still must happen once:
- Add the generated public key to your laptop/terminal host `~/.ssh/authorized_keys`.

For Colab or any third-party training machine, you can do that setup like this:

1. Generate a dedicated keypair on the training machine:

```bash
mkdir -p ~/.ssh
chmod 700 ~/.ssh
ssh-keygen -t ed25519 -f ~/.ssh/vertical_ed25519 -N ""
cat ~/.ssh/vertical_ed25519.pub
```

2. On your laptop/terminal host, append that public key to `~/.ssh/authorized_keys`:

```bash
mkdir -p ~/.ssh
chmod 700 ~/.ssh
echo "<PASTE_PUBLIC_KEY_FROM_TRAINING_MACHINE>" >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys
```

3. Confirm your laptop has SSH server enabled and reachable from the training machine.

4. In the training environment, provide:
- `ssh_host` (required)
- `ssh_user` (optional)
- `identity_file` (path to private key, for example `~/.ssh/vertical_ed25519`)
- optional `ssh_port` if your laptop SSH server is not on `22`

Example env configuration:

```bash
export VERTICAL_SSH_HOST=<your-laptop-host-or-ip>
export VERTICAL_SSH_USER=<your-laptop-user>
export VERTICAL_SSH_IDENTITY_FILE=~/.ssh/vertical_ed25519
export VERTICAL_SSH_PORT=22
```

Auth token behavior:
- If you do not set `VERTICAL_AUTH_TOKEN`, `vertical` generates a secure token automatically.
- Use that same token when querying metrics (`curl` or `vertical-tui --token ...`).
- You can disable automatic local key generation with `VERTICAL_AUTO_SSH_KEYGEN=false`.

## Reverse Tunnel Usage

```python
import vertical

with vertical.init(
    framework="pytorch",
    remote={"ssh_host": "your-terminal-host", "ssh_user": "your-user", "run_id": "exp-001"},
) as run:
    run.forward(loss=0.5, accuracy=0.8)
```

This creates:
- Training-side local metrics server on `127.0.0.1:METRICS_PORT`.
- Reverse tunnel exposing that service on laptop `127.0.0.1:PORT`.

From your terminal host, read metrics at:

```bash
curl -H "Authorization: Bearer $VERTICAL_AUTH_TOKEN" http://127.0.0.1:PORT/metrics
curl -H "Authorization: Bearer $VERTICAL_AUTH_TOKEN" http://127.0.0.1:PORT/metrics/history?limit=20
```

Notebook example:
- `examples/vertical_remote_tunnel_colab.ipynb`

## Framework compatibility scripts

Small training scripts for PyTorch, TensorFlow, and JAX live in:

- `tests/framework_scripts/train_pytorch_linear.py`
- `tests/framework_scripts/train_pytorch_classifier.py`
- `tests/framework_scripts/train_tensorflow_linear.py`
- `tests/framework_scripts/train_jax_linear.py`

## Development

```bash
uv run pytest
uv run ruff check .
```
