Metadata-Version: 2.4
Name: br-scratch-keepalive
Version: 0.1.2
Summary: CLI for keeping large BR200 scratch datasets warm with resumable refresh jobs.
Author: Amit Subhash
License: MIT
Keywords: hpc,slurm,scratch,br200,keepalive
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: System :: Systems Administration
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file
Dynamic: requires-python

# br-scratch-keepalive

`br-scratch-keepalive` is a Python `3.9+` package that installs the `scratch-keepalive` CLI for running inside BR200 shell sessions. It manages large datasets under `/N/scratch/$USER/...`, refreshes them on a recurring scheduler, and keeps resumable checkpoint state outside scratch.

This is a best-effort anti-purge tool. It reduces risk for scratch datasets; it does not make scratch archival.

## Experimental disclaimer

This package is experimental and provided for experimentation only.

- use it entirely at your own risk
- you are solely responsible for any loss of data, missed refreshes, deletion, corruption, or other consequences arising from use or non-use of this package
- it is not meant for production or real-world operational use
- it is not guaranteed to comply with the policies, terms, or operating expectations of any HPC provider
- it is not an endorsed IU, UITS, or BR200 workflow
- it is not a production retention system
- it is not intended to override, defeat, or work around the normal lifecycle of temporary scratch storage
- scratch storage is temporary and may be deleted by the provider at any time under site policy

For BR200 specifically, treat this package as an experiment rather than an endorsed workflow. If IU, UITS, or BR200 administrators indicate that this usage is not allowed, do not use it. If there is any uncertainty, ask Research Technologies before using it.

## Shared cluster compliance

This package is intended to stay conservative on BR200:

- the `br200` profile enforces a minimum refresh cadence of `14` days
- it keeps only one future scheduled refresh job in the chain
- recurring runs use a small request on `general`
- it should not be used to evade, bypass, or test the limits of explicit IU or BR200 storage policy

If IU or BR200 admins tell you not to use this workflow, stop using it immediately.

## Policy caution

Use of BR200 and other IU resources remains subject to IU and UITS policy. This repository does not grant permission to retain scratch data longer than IU intends, and it should not be relied on as a loophole, workaround, or entitlement.

## What it does

- registers large datasets under your BR200 scratch space
- keeps a `keep-until` policy per dataset
- runs metadata-oriented refreshes, not bytewise rereads
- checkpoints partial refresh progress so the next run resumes instead of starting over
- stores logs, registry state, and checkpoint files outside scratch
- installs a recurring scheduler entry for the current BR200 user

## What it does not do

- it does not make scratch permanent
- it does not archive to Slate, Slate-Project, or SDA
- it does not run from your laptop
- it does not require or use a personal SSH alias
- it does not redownload missing data

## Install

From inside BR200:

```bash
python -m pip install br-scratch-keepalive
```

Or from a cloned repo:

```bash
python -m pip install .
```

## BR200 quickstart

```bash
python -m pip install .
scratch-keepalive init --profile br200
scratch-keepalive add \
  --name mr-rate \
  --path /N/scratch/$USER/datasets/Forithmus/MR-RATE \
  --keep-until 2026-07-31
scratch-keepalive refresh --name mr-rate
scratch-keepalive install-cron
scratch-keepalive status --name mr-rate
```

## Recommended workflow

1. Log into BR200 normally.
2. Install the package into your BR200 Python environment.
3. Run `scratch-keepalive init --profile br200`.
4. Add one or more datasets under `/N/scratch/$USER/...`.
5. Run one manual refresh to verify permissions and state layout.
6. Install the recurring scheduler entry.
7. Use `status` and `doctor` to inspect health.

## Commands

```text
scratch-keepalive init
scratch-keepalive add
scratch-keepalive list
scratch-keepalive status
scratch-keepalive refresh
scratch-keepalive extend
scratch-keepalive enable
scratch-keepalive disable
scratch-keepalive remove
scratch-keepalive install-cron
scratch-keepalive uninstall-cron
scratch-keepalive doctor
scratch-keepalive repair
```

## State layout

The BR200 profile keeps control-plane state outside scratch:

- registry: persistent dataset state
- checkpoints: resumable partial-refresh state
- logs: per-run refresh logs
- sentinel: a small tool-owned file in the dataset root

## Resume semantics

Refreshes are split into deterministic units. If a refresh run fails or times out:

- completed units stay recorded in checkpoint state
- remaining units are retried on the next run
- the checkpoint is deleted only after the full dataset refresh completes

## Publishing

Public package name:

- package: `br-scratch-keepalive`
- CLI command: `scratch-keepalive`

## Notes

- Run this from inside BR200, not from your laptop.
- The tool does not rely on a local SSH alias like `br200`.
- Default recurring cadence is every 14 days.
- The `br200` profile will not allow a cadence below 14 days.
- Default recurring job request is `general`, `1 CPU`, `2G`, `2:00:00`.
- Logs and checkpoint state live outside scratch.
- When `scrontab` is disabled on BR200, `install-cron` falls back to a self-resubmitting Slurm job.
