Metadata-Version: 2.4
Name: one-run1
Version: 0.1.2
Summary: ml run wrapper for reprodusible experiments and documentation
Requires-Python: >=3.10
Requires-Dist: pydantic<3,>=2
Requires-Dist: pyyaml<7,>=6
Provides-Extra: dev
Requires-Dist: pytest-cov<6,>=5; extra == 'dev'
Requires-Dist: pytest<9,>=8; extra == 'dev'
Description-Content-Type: text/markdown

# one-run

CLI wrapper for job management on local machines and SLURM

## install

```bash
pip install -e .
```

published package install:

```bash
pip install one-run1
```

## commands

- `one-run --version`: print installed package version
- `one-run --automation ...`: prefer json outputs by default for automation
- `one-run run <manifest.yaml>`: execute locally (blocking)
- `one-run run --watch <manifest.yaml>`: execute locally in background
- `one-run run <manifest.yaml> --json`: machine-readable run result
- `one-run submit <manifest.yaml>`: submit to SLURM
- `one-run submit <manifest.yaml> --json`: machine-readable submit result
- `one-run status <run_id>`: inspect run status
- `one-run status -l <run_id>`: extended run status (metadata/runtime/paths)
- `one-run status <run_id> --json`: machine-readable status payload for scripts/CI
- `one-run status <run_id> --wait --timeout 30m`: wait for terminal state (non-zero on timeout)
- `one-run cancel <run_id>`: cancel a running local watch or SLURM run.
- `one-run cancel --force <run_id>`: force-kill local run (SIGKILL) / SLURM KILL signal.
- `one-run cancel <run_id> --json`: machine-readable cancel result.
- `one-run ls`: list runs.
- `one-run ls --long`: detailed run list.
- `one-run ls --state success`: filter by state.
- `one-run ls --tag gpu`: filter runs by tag (repeatable).
- `one-run ls --group-by-tag`: group listed runs by tag.
- `one-run ls --json`: machine-readable run list payload.
- `one-run ps`: show active runs (`pending`/`running`) with process/job identity.
- `one-run ps --tag train`: filter active runs by tag.
- `one-run ps --json`: machine-readable active runs payload.
- `one-run ps -w --interval 2`: live refreshing process table.
- `one-run validate <manifest.yaml>`: validate manifest + placeholders.
- `one-run validate <manifest.yaml> --json`: machine-readable validation result.
- `one-run tail <run_id> [-n 30] [--stderr] [-f] [--since 10m]`: tail run logs.
- `one-run gc [--state failed] [--older-than 7d] [--dry-run]`: cleanup run directories.
- `one-run gc ... --json`: machine-readable cleanup result.

automation defaults:
- set `ONE_RUN_AUTOMATION=1` to make json the default output mode for commands that support `--json`
- or pass `--automation` before the command, e.g. `one-run --automation status <run_id>`
- explicit `--json` still works the same

docker local execution:
- fixture example: `tests/fixtures/manifest.docker.yaml`
- set `environment.kind: docker`
- set `environment.image`
- optional `environment.workdir` and `environment.mounts`
- optional `environment.env` and `environment.env_file`
- optional `environment.pull_policy`: `always | if_missing | never`
- optional `environment.labels` for container traceability

slurm retry policy (optional):
- `backend.slurm.retry_max_attempts`: retry count after first failure (e.g. `1`)
- `backend.slurm.retry_backoff_sec`: delay before retry submit (e.g. `30`)
- `backend.slurm.retry_on_states`: subset of `failed | timeout | cancelled`
- retries are triggered during `status` refresh (`one-run status ...`)

ls color from manifest:
- `experiment.color`: `black | red | green | yellow | blue | magenta | cyan | white`
- ls colors the `experiment` column for runs that define `experiment.color`

## run directory

```text
runs/<run_id>/
  config.snapshot.yaml
  metadata.json
  status.json
  runtime.json
  summary.json
  summary_min.json
  events.jsonl
  local.pid
  local.exit_code
  logs/stdout.log
  logs/stderr.log
  artifacts/
  metrics/
```
