Metadata-Version: 2.4
Name: one-run1
Version: 0.1.3
Summary: CLI lifecycle manager for long-running computational jobs
Requires-Python: >=3.10
Requires-Dist: pydantic<3,>=2
Requires-Dist: pyyaml<7,>=6
Provides-Extra: completion
Requires-Dist: argcomplete<4,>=3; extra == 'completion'
Provides-Extra: dev
Requires-Dist: pytest-cov<6,>=5; extra == 'dev'
Requires-Dist: pytest<9,>=8; extra == 'dev'
Description-Content-Type: text/markdown

# one-run

CLI lifecycle manager for long-running computational jobs

## install

```bash
pip install -e .
```

published package install:

```bash
pip install one-run1
```

optional shell completion (bash/zsh):
```bash
pip install -e ".[completion]"
activate-global-python-argcomplete --user
```

## commands

- `one-run --version`: print installed package version
- `one-run --automation ...`: prefer json outputs by default for automation
- `one-run run <manifest.yaml>`: execute locally (blocking)
- `one-run run --watch <manifest.yaml>`: execute locally in background
- `one-run run <manifest.yaml> --json`: machine-readable run result
- `one-run submit <manifest.yaml>`: submit to SLURM
- `one-run submit <manifest.yaml> --json`: machine-readable submit result
- `one-run status <run_id>`: inspect run status
- `one-run status -l <run_id>`: extended run status (metadata/runtime/paths)
- `one-run status <run_id> --json`: machine-readable status payload for scripts/CI
- `one-run status <run_id> --wait --timeout 30m`: wait for terminal state (non-zero on timeout)
- `one-run cancel <run_id>`: cancel a running local watch or SLURM run.
- `one-run cancel --force <run_id>`: force-kill local run (SIGKILL) / SLURM KILL signal.
- `one-run cancel <run_id> --json`: machine-readable cancel result.
- `one-run rm <run_id>`: stop active run if needed and remove `runs/<run_id>`.
- `one-run rm <run_id> --force --json`: force-stop before remove + machine-readable result.
- `one-run rm [--state failed] [--older-than 7d] [--dry-run]`: bulk cleanup mode.
- `one-run rm ... --json`: machine-readable cleanup result.
- `one-run ls`: list runs.
- `one-run ls --long`: detailed run list.
- `one-run ls --state success`: filter by state.
- `one-run ls --tag gpu`: filter runs by tag (repeatable).
- `one-run ls --group-by-tag`: group listed runs by tag.
- `one-run ls --json`: machine-readable run list payload.
- `one-run ps`: show active runs (`pending`/`running`) with process/job identity.
- `one-run ps --tag train`: filter active runs by tag.
- `one-run ps --json`: machine-readable active runs payload.
- `one-run ps -w --interval 2`: live refreshing process table.
- `one-run validate <manifest.yaml>`: validate manifest + placeholders.
- `one-run validate <manifest.yaml> --json`: machine-readable validation result.
- `one-run tail <run_id> [-n 30] [--stderr] [-f] [--since 10m]`: tail run logs.
- `one-run listen [run_id ...] [--event ...] [--state ...] [--tag ...] [--handler module:function]`: inspect or watch lifecycle events for all runs or selected run ids.
- `one-run listen --manifest tests/fixtures/listeners/listener.local.yaml`: load local listener routes from manifest.
- `one-run listen` shows only new events after start by default; add `-h/--history` to include past events.

automation defaults:
- set `ONE_RUN_AUTOMATION=1` to make json the default output mode for commands that support `--json`
- or pass `--automation` before the command, e.g. `one-run --automation status <run_id>`
- explicit `--json` still works the same

docker local execution:
- fixture example: `tests/fixtures/manifests/manifest.docker.yaml`
- set `environment.kind: docker`
- set `environment.image`
- optional `environment.workdir` and `environment.mounts`
- optional `environment.env` and `environment.env_file`
- optional `environment.pull_policy`: `always | if_missing | never`
- optional `environment.labels` for container traceability

slurm retry policy:
- `backend.slurm.retry_max_attempts`: retry count after first failure (e.g. `1`)
- `backend.slurm.retry_backoff_sec`: delay before retry submit (e.g. `30`)
- `backend.slurm.retry_on_states`: subset of `failed | timeout | cancelled`
- retries are triggered during `status` refresh (`one-run status ...`)

ls color from manifest:
- `experiment.color`: `black | red | green | yellow | blue | magenta | cyan | white`
- ls colors the `experiment` column for runs that define `experiment.color`

event manager:
- default behavior stays the same: all lifecycle events are written to `runs/<run_id>/events.jsonl`
- you can attach extra subscribers in python via `one_run.event_manager.register_event_sink_factory(...)`
- each emitted event includes: `schema_version`, `timestamp`, `run_id`, `event`, and optional `state/message`

manifest listeners (v0):
- optional top-level `listeners:` section
- supported: `kind: local | webhook`
- local listener fields: `name`, `events`, `states`, `tags`, optional `handler`
- webhook listener fields: `webhook_url`, optional `action`, optional `timeout_sec`

- `listen --manifest` accepts a minimal listener-only yaml (only `listeners:` block), full run manifest fields are not required
- optional top-level `interval_sec` controls listen poll period for that manifest (default `1.0`)

## run directory

```text
runs/<run_id>/
  config.snapshot.yaml
  metadata.json
  status.json
  runtime.json
  summary.json
  summary_min.json
  events.jsonl
  local.pid
  local.exit_code
  logs/stdout.log
  logs/stderr.log
  artifacts/
  metrics/
```
