================================================================================
                         KUBEROKU — THE NORTH STAR
                    The Complete Specification & Blueprint
================================================================================

Version:  living
Status:   Living document — update as decisions are made


================================================================================
 1. VISION & POSITIONING
================================================================================

WHAT KUBEROKU IS
────────────────
Kuberoku is an open-source Python CLI + SDK that gives developers a Heroku-like
experience on top of vanilla Kubernetes. You bring a K8s cluster, Kuberoku
gives you `apps:create`, `deploy`, `config:set`, `logs --tail`, and everything
else you loved about Heroku — without vendor lock-in.

Kuberoku is NOT a hosting platform — it doesn't provide infrastructure.
It uses YOUR cluster. It can optionally deploy common dependencies
(postgres, redis) as addons, but these run on your cluster, not ours.
It is a CLI/SDK only. Kubernetes is the source of truth — all state (app config, formation, releases, secrets) is
stored in standard K8s resources (ConfigMaps, Secrets, Deployments). There
are zero server-side components — no operators, no CRDs, no control plane
pods. Just a Python package that talks to the K8s API.

Implications of K8s-as-state-store:
    - ConfigMaps are limited to ~1 MiB each (hence one-release-per-ConfigMap)
    - Secrets are base64 encoded, NOT encrypted unless the cluster has
      encryption-at-rest enabled (EncryptionConfiguration)
    - No transactional guarantees across resources (see Operation Atomicity)
    - Backup story: standard K8s backup tools (Velero, etcd snapshots) back up
      Kuberoku state automatically — no special Kuberoku backup needed

SECURITY MODEL — WHAT KUBEROKU DOES AND DOES NOT PROTECT
─────────────────────────────────────────────────────────
    Kuberoku ASSUMES:
    - Cluster RBAC is correctly configured by the cluster admin
    - Users with namespace access are trusted within that namespace
    - K8s API server TLS is enabled (standard in all managed K8s)

    Kuberoku PROVIDES:
    - CLI-level masking of secret values (--reveal to see them)
    - Separation of ConfigMap (non-sensitive) vs Secret (sensitive) storage
    - Never logs or prints secret values in debug output, error messages,
      or crash reports
    - Auditable: every change creates a release with author + timestamp

    Kuberoku DOES NOT PROVIDE:
    - Encryption of secrets at rest (cluster responsibility:
      EncryptionConfiguration)
    - Access control within a namespace (K8s RBAC cannot enforce
      label-level access — anyone with namespace access can read all
      Secrets in that namespace)
    - Protection against cluster admins (they can read everything)
    - Network-level encryption (K8s controls this: mTLS via service mesh)
    - Audit logging (K8s audit logs cover API access already)

    MASKING ≠ SECURITY. Masking prevents accidental exposure (screenshots,
    shared terminals, CI logs). It does NOT prevent determined access.
    Real security comes from K8s RBAC + encryption-at-rest + network
    policies — all of which are the cluster admin's responsibility, not
    Kuberoku's.

If you uninstall Kuberoku, your apps keep running.

WHO IT'S FOR
────────────
Heroku refugees who want the same DX on their own Kubernetes. Teams that
already have K8s clusters (EKS, GKE, AKS, on-prem, k3s) and want to stop
writing YAML.

THE EMPTY CELL
──────────────
                    Managed Only          Self-Hosted
                   ─────────────         ──────────────
  Amazing DX    │  Railway, Vercel   │   NOBODY  <-- Kuberoku fills this
  OK DX         │  Heroku, Fly.io    │   Coolify, Kubero
  Raw           │  AWS/GCP/Azure     │   Raw K8s, Rancher

KEY DIFFERENTIATORS VS HEROKU
─────────────────────────────
1. Non-HTTP services    — Deploy SMTP, DNS, game servers, anything TCP/UDP
2. Multi-port           — One app can listen on 25/tcp AND 587/tcp AND 993/tcp
3. Any K8s cluster      — EKS, GKE, AKS, k3s, on-prem, Raspberry Pi
4. Open-source          — Apache 2.0 licensed (patent grant + attribution required)
5. Multi-cluster        — Manage staging + production from one CLI
6. Importable as SDK    — `from kuberoku import Kuberoku` works in Python
7. Plugin ecosystem     — Anyone publishes `kuberoku-*` packages to extend it
8. Zero footprint       — No server-side components, no CRDs, no operators
9. White-label ready    — Tool name + resource prefix configurable from one place


================================================================================
 2. BRANDING & NAMING — SINGLE-SOURCE CONFIGURABILITY
================================================================================

PRINCIPLE
─────────
The name "kuberoku" appears in many places: CLI command, K8s resource names,
label domains, config directories, env vars, Python imports. ALL of these
are derived from a single file — branding.py — so that rebranding is a
controlled, mechanical process rather than a find-and-replace prayer.

There are TWO LAYERS of naming:

    LAYER 1 — RUNTIME BRANDING (branding.py, one-file change)
    Everything the user SEES and everything written to K8s. Changing
    branding.py alone gives you a fully rebranded CLI + K8s footprint.

    LAYER 2 — PACKAGE IDENTITY (directory names, pyproject.toml)
    The Python import path, config directory, env var prefix, default
    namespace. These are structural — they require renaming directories
    and updating pyproject.toml. Not hard, but not a one-liner.

THE SINGLE SOURCE: src/kuberoku/branding.py
───────────────────────────────────────────
    # src/kuberoku/branding.py
    # ── EVERY BRANDED STRING IN THE CODEBASE DERIVES FROM THIS FILE ──

    # Tool identity (affects CLI command name, help text, user-facing strings)
    TOOL_NAME: str = "kuberoku"

    # Default K8s resource prefix (affects labels, resource names, annotations)
    # NO trailing hyphen — concatenation functions add the "-" separator.
    # e.g., resource name = f"{PREFIX}-app-{name}" → "kuberoku-app-myapi"
    DEFAULT_RESOURCE_PREFIX: str = TOOL_NAME

    # Default label domain (affects label keys on all K8s resources)
    # NOT derived from TOOL_NAME — requires owning the domain.
    DEFAULT_LABEL_DOMAIN: str = "app.kuberoku.com"

    # Plugin PyPI prefix (affects plugin discovery convention)
    PLUGIN_PREFIX: str = f"{TOOL_NAME}-"

    # Plugin entry_points group name
    PLUGIN_GROUP: str = f"{TOOL_NAME}.plugins"

    # Config directory name (~ /.kuberoku/)
    CONFIG_DIR_NAME: str = f".{TOOL_NAME}"

    # Project marker file name (.kuberoku)
    PROJECT_FILE_NAME: str = f".{TOOL_NAME}"

    # Environment variable prefix (KUBEROKU_APP, KUBEROKU_CLUSTER, etc.)
    ENV_PREFIX: str = TOOL_NAME.upper() + "_"

    # Default K8s namespace
    DEFAULT_NAMESPACE: str = TOOL_NAME

ALL of these are derived from TOOL_NAME. Change TOOL_NAME = "k8u" and:
    DEFAULT_RESOURCE_PREFIX → "k8u"     (k8u-app-myapi)
    CONFIG_DIR_NAME  → ".k8u"           (~/.k8u/config.yaml)
    PROJECT_FILE_NAME → ".k8u"          (.k8u project file)
    ENV_PREFIX       → "K8U_"           (K8U_APP, K8U_CLUSTER, etc.)
    DEFAULT_NAMESPACE → "k8u"           (default K8s namespace)
    PLUGIN_PREFIX    → "k8u-"           (k8u-postgres on PyPI)
    PLUGIN_GROUP     → "k8u.plugins"    (entry_points group)

    DEFAULT_LABEL_DOMAIN must be set manually — it's a real domain you own.
    Change "app.kuberoku.com" to your own domain (e.g. "app.k8u.io").

EVERY other file imports from branding.py:

    # In labels.py
    from kuberoku.branding import DEFAULT_LABEL_DOMAIN, DEFAULT_RESOURCE_PREFIX

    # In cli/main.py
    from kuberoku.branding import TOOL_NAME

    # In cli/output.py (error messages, help text)
    from kuberoku.branding import TOOL_NAME

    # In config/paths.py
    from kuberoku.branding import CONFIG_DIR_NAME, PROJECT_FILE_NAME

    # In config/env.py
    from kuberoku.branding import ENV_PREFIX
    APP_ENV_VAR = f"{ENV_PREFIX}APP"           # → "KUBEROKU_APP"
    CLUSTER_ENV_VAR = f"{ENV_PREFIX}CLUSTER"   # → "KUBEROKU_CLUSTER"
    # etc.

    # In plugins/loader.py
    from kuberoku.branding import PLUGIN_PREFIX, PLUGIN_GROUP

    # In sdk/context.py (namespace resolution)
    from kuberoku.branding import DEFAULT_NAMESPACE

NO file outside branding.py contains a hardcoded "kuberoku" string for
tool identity, resource naming, config paths, env vars, or conventions.
Not cli/main.py, not labels.py, not models.py, not context.py.

The ONLY place "kuberoku" is structurally hardcoded is the Python package
name itself (the `src/kuberoku/` directory and `import kuberoku` statements).
This is Layer 2 — see "HOW TO REBRAND" below.

PER-CLUSTER RESOURCE PREFIX OVERRIDE
─────────────────────────────────────
The resource prefix and label domain can be overridden per-cluster in the
global config. This lets you run different "brands" on different clusters,
or avoid collisions if multiple teams share a cluster.

    # ~/.kuberoku/config.yaml
    current_cluster: production
    clusters:
      production:
        context: prod-eks
        # Uses defaults (kuberoku-, app.kuberoku.com)

      staging:
        context: staging-gke
        resource_prefix: k8u               # Resources named k8u-myapp-web
        label_domain: app.k8u.io          # Labels: app.k8u.io/managed-by

      legacy:
        context: old-cluster
        resource_prefix: heroku-compat     # Resources named heroku-compat-myapp-web
        label_domain: app.heroku-compat.dev

RESOLUTION ORDER (for resource prefix and label domain)
───────────────────────────────────────────────────────
    1. CLI flag           --resource-prefix k8u
    2. Environment var    {ENV_PREFIX}RESOURCE_PREFIX (e.g., KUBEROKU_RESOURCE_PREFIX)
    3. Cluster config     resource_prefix in config.yaml
    4. branding.py        DEFAULT_RESOURCE_PREFIX (the compile-time default)

WHAT EACH SETTING CONTROLS
──────────────────────────

    Setting               Affects                            Example (default)        Example (k8u)
    ──────────────────    ─────────────────────────────────  ─────────────────────    ──────────────────
    TOOL_NAME             CLI binary name                    kuberoku                 k8u
                          Help text ("Usage: kuberoku ...")  kuberoku                 k8u
                          Error messages                     kuberoku                 k8u
                          User-facing strings                kuberoku                 k8u

    resource_prefix       ConfigMap names                    kuberoku-app-myapi       k8u-app-myapi
                          Deployment names                   kuberoku-myapi-web       k8u-myapi-web
                          Service names                      kuberoku-myapi-web       k8u-myapi-web
                          Ingress names                      kuberoku-myapi           k8u-myapi
                          Release ConfigMap names            kuberoku-release-*       k8u-release-*
                          Job names                          kuberoku-run-*           k8u-run-*
                          managed-by label VALUE             kuberoku                 k8u

    label_domain          Label key prefix                   app.kuberoku.com/*       app.k8u.io/*
                          Annotation key prefix              app.kuberoku.com/*       app.k8u.io/*

    PLUGIN_PREFIX         PyPI package convention            kuberoku-*               k8u-*
    PLUGIN_GROUP          entry_points group name            kuberoku.plugins         k8u.plugins

    CONFIG_DIR_NAME       Global config directory            ~/.kuberoku/             ~/.k8u/
    PROJECT_FILE_NAME     Project marker file                .kuberoku                .k8u
    ENV_PREFIX            Environment variable prefix        KUBEROKU_*               K8U_*
    DEFAULT_NAMESPACE     Default K8s namespace              kuberoku                 k8u

HOW TO REBRAND (FORK SCENARIO)
──────────────────────────────

    LAYER 1 — RUNTIME ONLY (branding.py, one file, 2 minutes)
    ──────────────────────────────────────────────────────────
    Change TOOL_NAME in branding.py. Everything else derives from it.

    Edit src/kuberoku/branding.py:
        TOOL_NAME = "k8u"
        DEFAULT_RESOURCE_PREFIX = "k8u"
        DEFAULT_LABEL_DOMAIN = "app.k8u.io"
        # Everything else auto-derives from TOOL_NAME:
        # PLUGIN_PREFIX → "k8u-"
        # PLUGIN_GROUP  → "k8u.plugins"
        # CONFIG_DIR_NAME → ".k8u"
        # PROJECT_FILE_NAME → ".k8u"
        # ENV_PREFIX → "K8U_"
        # DEFAULT_NAMESPACE → "k8u"

    Result: CLI says "k8u", K8s resources say "k8u-*", config lives at
    ~/.k8u/, env vars are K8U_APP, project file is .k8u.
    Python imports still say `from kuberoku import ...` — this is fine.

    LAYER 2 — FULL REBRAND (package rename, 10 minutes)
    ────────────────────────────────────────────────────
    If you also want `from k8u import Kuberoku` and `pip install k8u`:

    1. Do Layer 1 above.

    2. Rename the package directory:
           mv src/kuberoku src/k8u
           # Update all internal imports: sed -i 's/from kuberoku/from k8u/g'

    3. Update pyproject.toml:
           [project]
           name = "k8u"
           [project.scripts]
           k8u = "k8u.cli.main:run"
           [project.entry-points."k8u.plugins"]

    4. Update test imports similarly.

    Layer 1 is sufficient for most forks. Layer 2 is only needed if you
    want clean PyPI packaging and SDK import paths.

    These DON'T have to match. You could ship the CLI as "k8u" but keep
    the Python package as "kuberoku" for import stability. TOOL_NAME
    controls UX; the package directory controls SDK imports.

NOTE ON EXISTING RESOURCES
──────────────────────────
If you change the prefix on a cluster that already has resources, the old
resources become "invisible" to the new prefix (different label selectors).
Kuberoku will NOT auto-migrate. This is intentional — changing a prefix is
a "new identity" operation. If you need migration, run:

    kuberoku apps --resource-prefix old-prefix    # list old apps
    # Then re-create under new prefix

This prevents accidental data loss from prefix typos.


================================================================================
 3. END-TO-END USER EXPERIENCE
================================================================================

FIRST-TIME INSTALL
──────────────────
    $ pipx install kuberoku            # recommended
    $ kuberoku --version
    kuberoku 0.1.0

    # Or: pip install kuberoku
    # Or: curl -fsSL https://github.com/amanjain/kuberoku/releases/latest/download/install.sh | sh
    # Or: brew install kuberoku/tap/kuberoku  (planned)

CREATING FIRST APP
──────────────────
    $ kuberoku apps:create myapi
    Creating myapi... done
    App myapi created on cluster default

    $ cd ~/projects/myapi
    $ kuberoku apps:link:add myapi
    Linked to myapi (wrote .kuberoku)

    # Same project, multiple environments:
    $ kuberoku apps:link:add myapi-staging --as staging
    $ kuberoku apps:link:add myapi-prod --as prod --cluster production
    $ kuberoku services:logs                   # uses default
    $ kuberoku services:logs --app prod        # resolves "prod" from .kuberoku

DEPLOY FLOW
───────────
    $ kuberoku config:set DATABASE_URL=postgres://db:5432/myapi SECRET_KEY=hunter2
    Setting config vars... done (no running pods to restart)

    $ kuberoku deploy --image myapi:v1
    Deploying myapi:v1... done
    Recording release v1... done
    web.1: up (running for 3s)
    App URL: https://myapi.apps.mycluster.com

    $ kuberoku ps
    === web (1 dyno)
    web.1: up 2025/02/09 15:30:01 (running for 45s)

    $ kuberoku services:logs --tail
    web.1 | Listening on :8080

DAY-TO-DAY OPERATIONS
─────────────────────
    $ kuberoku ps:scale web=3 worker=2
    Scaling web to 3, worker to 2... done

    $ kuberoku config:set NEW_FEATURE=true
    Setting config vars and restarting... done

    $ kuberoku releases
    v3  Deploy myapi:v1               user@host  2025/02/09 15:35
    v2  Set NEW_FEATURE               user@host  2025/02/09 15:33
    v1  Deploy myapi:v1               user@host  2025/02/09 15:30

    $ kuberoku releases:rollback v1
    Rolling back to v1... done
    Release v4 created (rollback to v1)

NON-WEB APP — FULL LIFECYCLE (catch-all SMTP, multi-port, addons)
─────────────────────────────────────────────────────────────────
The same experience works when your main service isn't HTTP:

    $ kuberoku apps:create catchall-smtp

    $ kuberoku addons:create postgres
    ✓ postgres created (postgresql 16, dev)
      DATABASE_URL → injected

    $ kuberoku addons:create postgres --as analytics
    ✓ analytics created (postgresql 16, dev)
      ANALYTICS_URL → injected

    $ kuberoku addons:create redis --as sessions
    ✓ sessions created (redis 7, dev)
      SESSIONS_URL → injected

    $ kuberoku deploy --image catchall:v1 \
        --port 25/tcp --port 465/tcp --port 587/tcp --port 2525/tcp
    Deploying catchall:v1...
    Release v1 created
    To expose: kuberoku services:expose:on smtp

    $ kuberoku services:expose:on smtp
    External IP: 34.123.45.67
      :25   :465   :587   :2525

    $ kuberoku addons:expose:on analytics
    External: 34.123.45.68:5432
    ⚠ Database accessible from internet

    $ kuberoku ps:scale smtp=3
    Scaling smtp to 3... done

    $ kuberoku apps:status
    === catchall-smtp (production)
    Release:     v1 — Deploy catchall:v1 — 5m ago

    Processes:
      smtp       3 dynos (3/3 up)     25, 465, 587, 2525/tcp
                 EXPOSED              34.123.45.67

    Addons:
      postgres   postgresql 16  dev   running    DATABASE_URL     internal
      analytics  postgresql 16  dev   running    ANALYTICS_URL    EXPOSED 34.123.45.68:5432
      sessions   redis 7        dev   running    SESSIONS_URL     internal

    Later — add a port without redeploying:
    $ kuberoku services:ports:add 993/tcp
    Added 993/tcp. Rolling restart...

    Later — unexpose the analytics database:
    $ kuberoku addons:expose:off analytics
    Internal only.

    Connect to any addon for debugging:
    $ kuberoku addons:connect postgres
    localhost:5432 → postgres:5432
    Hint: psql postgres://localhost:5432/app

    $ kuberoku addons:connect analytics
    localhost:5433 → analytics:5432         ← auto-avoids port conflict
    Hint: psql postgres://localhost:5433/app

MIXED APP — HTTP + TCP ON SAME APP (multiple process types)
───────────────────────────────────────────────────────────
    Procfile:
        web: gunicorn api:app --bind 0.0.0.0:8080
        grpc: python grpc_server.py --port 50051

    $ kuberoku apps:create myplatform
    $ kuberoku addons:create postgres
    $ kuberoku deploy
    Building from commit abc1234 (main)...
      Creating web... done
      Creating grpc... done (port 50051/tcp)
    Release v1 created
    App URL: https://myplatform.apps.mycluster.com   ← auto (web)

    $ kuberoku domains:add api.myplatform.com         ← custom domain for web
    $ kuberoku services:expose:on grpc                    ← LoadBalancer for gRPC
    External IP: 34.5.6.7:50051

    Result:
    - api.myplatform.com → web:8080 (via Ingress, shared IP)
    - 34.5.6.7:50051 → grpc (via LoadBalancer, own IP)
    - postgres addon → internal only
    - All from one `kuberoku deploy`

WHAT THE USER NEVER HAS TO TOUCH
────────────────────────────────
- Kubernetes YAML manifests
- Deployment/Service/Ingress definitions
- ConfigMap/Secret boilerplate
- kubectl commands
- Helm charts (for core features)
- Namespace management
- Label/selector wiring
- Rolling update configuration
- TLS certificate management (handled by cert-manager annotation)


================================================================================
 4. COMPLETE COMMAND REFERENCE
================================================================================

Every command supports `--app <name>` (or `-a`) to target a specific app.
If omitted, Kuberoku resolves the app from the .kuberoku file in the current
directory tree, or presents an interactive picker.

Global flags (all commands):
    --app, -a TEXT       Target app (name or alias from .kuberoku)
    --cluster TEXT       Target cluster name
    --format TEXT        Output format: table (default), json, yaml
    --no-color           Disable colored output
    --verbose, -v        Verbose output
    --help               Show help


4.1  APPS
─────────
Heroku equivalent: heroku apps:*
K8s resources: Namespace (labels), ConfigMaps (app manifest + env)

    kuberoku apps                          List all apps
        --cluster TEXT                     Filter by cluster (default: current)
        --all                              Show apps across all clusters

    kuberoku apps:create NAME              Create a new app
        --cluster TEXT                     Target cluster (default: current)
        --region TEXT                      Hint label (no enforcement)

    kuberoku apps:info [NAME]              Show detailed app info
                                           (dynos, config var count, domains,
                                            addons, last release, created at)

    kuberoku apps:destroy NAME             Permanently delete app and all resources
        --confirm NAME                     Skip confirmation prompt

    kuberoku apps:rename OLD NEW           Rename an app
        IMPORTANT: K8s resource names are IMMUTABLE. Rename cannot change
        resource names like "kuberoku-web-oldname". Instead, rename:
        1. Updates the app label ({DOMAIN}/app=newname) on all resources
        2. Creates new app manifest ConfigMap with new name
        3. Deletes old app manifest ConfigMap
        4. Resource names keep the old app name embedded (cosmetic only)
        This is the same approach as Heroku: the internal name changes,
        the infrastructure identifiers may retain the old name.

    kuberoku apps:link                     Show current project links
    kuberoku apps:link:add NAME            Link current directory to app
        [--as TEXT]                        Alias (e.g. staging, prod)
        [--cluster TEXT]                   Cluster for this alias
                                           (writes .kuberoku file)

    kuberoku apps:link:remove              Remove .kuberoku file
        [--as TEXT]                        Remove only this alias
                                           (no --as = remove entire file)

    K8s mapping:
        apps:create  →  ConfigMap kuberoku-app-{name} (app manifest)
                        ConfigMap kuberoku-env-{name} (env vars)
                        ConfigMap kuberoku-formation-{name} (formation)
                        Labels: app.kuberoku.com/managed-by=kuberoku
                                app.kuberoku.com/app={name}

        App manifest ConfigMap schema ({PREFIX}-app-{name}):
            {
              "name": "myapi",
              "created_at": "2025-02-09T15:30:01Z",
              "created_by": "user@host",
              "current_release_version": 7,
              "namespace": "kuberoku"
            }
        apps         →  List ConfigMaps with label managed-by=kuberoku,type=app-manifest
        apps:info    →  Read manifest ConfigMap + list Deployments + Pods
        apps:destroy →  Delete ALL resources matching BOTH labels:
                        {DOMAIN}/managed-by={PREFIX} AND {DOMAIN}/app={name}
                        Both labels MUST match. This prevents deleting resources
                        that happen to share an app label but were not created
                        by Kuberoku.


4.2  PS (PROCESS MANAGEMENT)
─────────────────────────────
Heroku equivalent: heroku ps:*
K8s resources: Deployment (replicas), Pods

    kuberoku ps                            List all dynos (grouped by type)
                                           Output: web.1, web.2, worker.1, etc.

    DYNO NUMBERING (presentation-only, NOT stable identifiers):
        Dyno names like web.1, web.2 are computed at display time by sorting
        running Pods by start time and numbering them. They are NOT stable:
        - If web.1 restarts, the new Pod has a different K8s name
        - If web.2 starts before web.1 finishes, numbering may shift
        - Scaling events renumber all dynos
        Under the hood, `ps:restart --dyno web.1` resolves to "the Nth Pod
        of that process type sorted by creation time" at the moment the
        command runs. This matches Heroku's display convention.

    kuberoku ps:scale TYPE=N [TYPE=N ...]  Scale process types
        Examples:
            kuberoku ps:scale web=3
            kuberoku ps:scale web=3 worker=2

    kuberoku ps:restart                    Restart all dynos
        --type TEXT                        Restart only this process type
        --dyno TEXT                        Restart specific dyno (e.g., web.1)
                                           Resolved at command time (see above)

    kuberoku ps:stop TYPE                  Scale a process type to 0

    kuberoku ps:type                       Show/change dyno resources
        --cpu TEXT                         CPU request/limit (e.g., "250m/500m")
        --memory TEXT                      Memory request/limit (e.g., "256Mi/512Mi")
        --type TEXT                        Target process type

    kuberoku ps:set TYPE=CMD [TYPE=CMD ...]   Set the command for process types
        Examples:
            kuberoku ps:set web="gunicorn app:app --bind 0.0.0.0:8080"
            kuberoku ps:set worker="celery -A myapp worker --loglevel=info"
            kuberoku ps:set web="gunicorn app:app" worker="celery -A myapp worker"
        --no-restart                       Don't trigger rolling restart

        This is the Procfile equivalent. On Heroku, the Procfile tells each
        process type which command to run. In Kuberoku, `ps:set` does the same:

            Heroku Procfile:
                web: gunicorn app:app --bind 0.0.0.0:$PORT
                worker: celery -A myapp worker
                clock: celery -A myapp beat

            Kuberoku equivalent:
                kuberoku ps:set \
                    web="gunicorn app:app --bind 0.0.0.0:8080" \
                    worker="celery -A myapp worker" \
                    clock="celery -A myapp beat"

        Commands are stored in the formation ConfigMap (see Section 9) alongside
        replicas. They are applied as the container `command` override in the
        Deployment spec (K8s equivalent of Docker ENTRYPOINT + CMD override).

        Setting a command triggers a rolling restart (unless --no-restart) because
        K8s must recreate Pods to apply the new command.

    kuberoku ps:commands                   Show the command for each process type
        Output:
            === Process Commands for myapp
            web:     gunicorn app:app --bind 0.0.0.0:8080
            worker:  celery -A myapp worker --loglevel=info
            clock:   (not set — uses image default)

    PROCFILE SUPPORT
    ────────────────
    Kuberoku reads a `Procfile` at deploy time if present.
    This is for Heroku compatibility and convenience — NOT a requirement.

        $ cat Procfile
        web: gunicorn app:app --bind 0.0.0.0:8080
        worker: celery -A myapp worker
        clock: celery -A myapp beat

        $ kuberoku deploy
        Reading Procfile (from commit abc1234)...
          web: gunicorn app:app --bind 0.0.0.0:8080
          worker: celery -A myapp worker
          clock: celery -A myapp beat
        Deploying myapp:abc1234 to all process types...
          ...

    WHERE PROCFILE IS READ FROM (important — two cases):
        - Build-from-git (kuberoku deploy):
          Procfile is read from the GIT ARCHIVE (committed version).
          This matches the build context — only committed code deploys,
          only the committed Procfile applies. Uncommitted Procfile
          changes are ignored, just like uncommitted code changes.

        - Pre-built image (kuberoku deploy --image):
          Procfile is read from the CURRENT DIRECTORY (working tree).
          There's no git archive in this case. This lets CI pipelines
          or users deploying pre-built images still benefit from Procfile.

    PROCFILE vs ps:set PRECEDENCE (deterministic, no surprises)
    ──────────────────────────────────────────────────────────
    The formation ConfigMap tracks HOW each command was set:

        {
          "web": {
            "replicas": 3,
            "command": "gunicorn app:app",
            "command_source": "manual"        ← set via ps:set
          },
          "worker": {
            "replicas": 2,
            "command": "celery -A myapp worker",
            "command_source": "procfile"       ← set via Procfile
          }
        }

    command_source values:
        "manual"   — set via `ps:set` (explicit CLI command)
        "procfile" — set via Procfile at deploy time
        "default"  — never set, using image CMD/ENTRYPOINT (command is null)

    The rule: Procfile ONLY updates commands where command_source != "manual".
    This means:
        - `ps:set` always wins — it's an explicit override
        - Procfile updates "procfile" and "default" entries
        - Removing a line from Procfile does NOT erase a manual override
        - To go back to Procfile-managed: `ps:set web --reset` clears the
          manual override (sets command_source back to "default")

    Example:
        $ kuberoku ps:set web="gunicorn app:app"     # command_source → "manual"
        $ cat Procfile
        web: uvicorn app:main                         # different from ps:set!
        worker: celery -A myapp worker
        $ kuberoku deploy
        Reading Procfile...
          web: skipped (manual override via ps:set)   ← ps:set wins
          worker: celery -A myapp worker              ← Procfile applies
        Deploying...

    Rules:
        - Process types in Procfile that don't have Deployments yet are
          auto-created (same as `deploy --type` for a new process type)
        - The Procfile itself is NOT persisted. Only the resulting commands
          are stored in the formation ConfigMap as the source of truth.
        - --no-procfile flag on deploy to explicitly skip Procfile reading
        - If no Procfile exists and no commands are set, the container image's
          default CMD/ENTRYPOINT runs (standard Docker/K8s behavior)

    K8s mapping:
        ps          →  List Pods by label, group by process-type label
        ps:scale    →  Patch Deployment replicas
        ps:restart  →  Delete Pods (Deployment controller recreates them)
        ps:stop     →  Scale Deployment to 0 replicas
        ps:type     →  Patch Deployment container resources
        ps:set      →  Update formation ConfigMap + Patch Deployment container
                        command field (spec.containers[0].command)
        ps:commands →  Read formation ConfigMap, display commands


4.3  CONFIG (ENVIRONMENT VARIABLES)
────────────────────────────────────
Heroku equivalent: heroku config:*
K8s resources: ConfigMap (env vars), Secrets (sensitive values)

    kuberoku config                        Show all config vars
        --shell                            Output as KEY=VALUE (for eval)
        --json                             Output as JSON object
        --reveal                           Show secret values (default: masked)

    kuberoku config:set KEY=VAL [K=V ...]  Set one or more config vars
        --no-restart                       Don't trigger rolling restart
        --secret                           Store in K8s Secret instead of ConfigMap

        Shell quoting: values with spaces or special chars need quotes:
            kuberoku config:set GREETING="hello world" DB="postgres://u:p@host/db"

    kuberoku config:unset KEY [KEY ...]    Remove config vars
        --no-restart                       Don't trigger rolling restart

    kuberoku config:get KEY                Get a single config var value
        --reveal                           Show secret value (default: masked)

    SECRET MASKING (safer than Heroku)
    ──────────────────────────────────
    Config vars stored in K8s Secrets (via --secret) are MASKED by default
    in all output. This prevents accidental exposure in screenshots, logs,
    or shared terminals.

        $ kuberoku config
        DATABASE_URL:   postgres://user:pass@host/db     (ConfigMap — visible)
        SECRET_KEY:     ********                         (Secret — masked)
        API_TOKEN:      ********                         (Secret — masked)
        APP_NAME:       myapi                            (ConfigMap — visible)

        $ kuberoku config --reveal
        ⚠ WARNING: Revealing secret values.
        DATABASE_URL:   postgres://user:pass@host/db
        SECRET_KEY:     hunter2
        API_TOKEN:      sk-abc123def456
        APP_NAME:       myapi

        $ kuberoku config:get SECRET_KEY
        ********

        $ kuberoku config:get SECRET_KEY --reveal
        hunter2

    Rules:
        - ConfigMap values: always visible (not sensitive by default)
        - Secret values: masked as "********" unless --reveal is passed
        - --shell and --json also mask secrets unless --reveal
        - config:set echoes the KEY name but never the VALUE for secrets

    CONFIG VAR LIMITS
    ─────────────────
    ConfigMaps and Secrets are limited to ~1 MiB each by K8s. In real-world
    enterprise apps, env vars can grow large. Hard limits to fail early
    instead of hitting opaque K8s errors:

        Max config vars per app:     256
        Max key length:              256 characters
        Max value length:            128 KiB (131,072 bytes)
        Max total config size:       512 KiB (sum of all key+value bytes)

    These are enforced at config:set time with clear errors:

        Error: Config var CERT_CHAIN exceeds max value size (128 KiB).
        Consider storing large values as K8s Secrets directly or using
        a mounted volume.

        Error: App myapi has 256 config vars (max). Remove unused vars
        with config:unset before adding more.

    Why these numbers:
    - 256 vars: Heroku allows 64 KB total. We're more generous but still bounded.
    - 128 KiB per value: Covers base64-encoded certificates, JWT keys, etc.
      Anything larger should be a file/volume, not an env var.
    - 512 KiB total: ~50% of the 1 MiB ConfigMap limit, leaving headroom for
      K8s metadata overhead and future fields.

    The limits are constants in branding.py (overridable per-cluster for
    enterprise users who know what they're doing).

    RELEASES ON CONFIG CHANGE (Heroku-like)
    ────────────────────────────────────────
    Every config:set and config:unset creates a NEW RELEASE, just like Heroku.
    This means:
        - `kuberoku releases` shows config changes alongside deploys
        - `kuberoku releases:rollback` can revert a config change
        - Every config change has a timestamp, author, and description

        $ kuberoku config:set NEW_FEATURE=true
        Setting config vars and restarting... done, v8

        $ kuberoku releases
        v8  Set NEW_FEATURE              user@host  2025-02-09 15:30
        v7  Deploy abc1234 (main)        user@host  2025-02-09 14:00
        v6  Unset OLD_FLAG               user@host  2025-02-09 12:00

    The release created by config:set stores:
        - images: carried forward unchanged (same as previous release)
        - formation: carried forward unchanged
        - env_diff: {"NEW_FEATURE": "true"} (only the changed keys)
        - description: "Set NEW_FEATURE"

    K8s mapping:
        config      →  Read ConfigMap kuberoku-env-{app} + Secret kuberoku-secret-{app}
                       Merge both, mask Secret values unless --reveal
        config:set  →  Patch ConfigMap + CREATE Release ConfigMap + trigger
                       rolling restart via annotation change
                       (kubectl.kubernetes.io/restartedAt)
                       If --secret: store in Secret kuberoku-secret-{app}
        config:unset → Remove key from ConfigMap/Secret + CREATE Release ConfigMap
                       + rolling restart

    RESTART BEHAVIOR (consistent rule):
        - If Deployments exist (app has been deployed) → rolling restart of
          ALL process types (web, worker, smtp, etc.) immediately.  Config
          vars are shared across the entire app, so every type must pick up
          the new values.  Config change takes effect within seconds.
          Output: "Setting config vars and restarting... done"
        - If NO Deployments exist (config:set before first deploy) → no
          restart (nothing to restart). Config is stored, will be picked
          up by the first deploy.
          Output: "Setting config vars... done (no running pods to restart)"
        - --no-restart flag: store config but skip restart. User must
          manually run `ps:restart` or wait for next deploy.
          Output: "Setting config vars... done (restart skipped)"


4.4  DEPLOY
───────────
Heroku equivalent: git push heroku main / heroku container:release
K8s resources: Deployment (image update), ConfigMap (release record)

    kuberoku deploy                        Build from git & deploy (default)
        --ref TEXT                         Git ref to build (default: HEAD)
                                           Examples: v1.2.0, abc1234, feat/payments
        --image TEXT                       Deploy a pre-built image (skips build)
                                           Example: registry.io/myapp:v3
        --type TEXT                        Only update this process type (default: ALL)
        --port TEXT                        Port/protocol (repeatable)
                                           Examples: 8080/tcp, 53/udp
                                           Default for web: 8080/tcp
        --replicas INT                     Initial replica count (default: 1)
                                           Only used when CREATING a new process type
        --wait                             Wait for rollout to complete (default: true)
        --no-wait                          Don't wait for rollout
        --timeout INT                      Rollout timeout in seconds (default: 300)
        --env KEY=VAL                      Set env vars with this deploy (repeatable)
        --cpu TEXT                         CPU request/limit
        --memory TEXT                      Memory request/limit
        --no-procfile                      Don't read Procfile from git archive

    TWO MODES: BUILD-FROM-GIT vs PRE-BUILT IMAGE
    ──────────────────────────────────────────────
    Kuberoku's core job is deployment. It includes a lightweight local
    build wrapper (git archive → docker build → push) as a convenience
    so that `kuberoku deploy` is one command on a developer laptop.
    In production, CI does the build and passes `--image` — Kuberoku
    only handles the deploy.

        Scenario                    Command                                Who builds
        ─────────────────────────   ──────────────────────────────────     ──────────────────────
        Developer laptop            kuberoku deploy                        Builds from HEAD locally
        Tagged release              kuberoku deploy --ref v1.2.0           Builds from git tag
        Specific commit             kuberoku deploy --ref abc1234          Builds from that commit
        Test another branch         kuberoku deploy --ref feat/payments    Builds from branch tip
        CI/CD pipeline              kuberoku deploy --image reg/app:v3     CI already built it

    BUILD-FROM-GIT (default, no --image):
    ─────────────────────────────────────
    When you run `kuberoku deploy` without --image, Kuberoku builds from
    your git history. This is the `git push heroku main` equivalent — but
    the build happens locally, zero cluster footprint.

    How it works:
        1. Resolve git ref → commit SHA (default: HEAD of current branch)
        2. Run `git archive <sha>` → clean tarball of committed files ONLY
           - No .git/ directory
           - No untracked files
           - No uncommitted changes
           - No .gitignore'd files
        3. Run `docker build` from that archive
        4. Tag image as {app}:{short-sha} (e.g., myapp:abc1234)
        5. Push to configured registry
        6. Deploy the image (same as --image path from here)

    WHY GIT ARCHIVE, NOT WORKING DIRECTORY:
        - Clean: no accidental .env, uncommitted debug print(), half-done code
        - Reproducible: same commit always builds the same image
        - Traceable: every release maps to a specific commit SHA
        - Safe: `kuberoku deploy` always deploys what you COMMITTED, not what's
          lying around in your editor

    REGISTRY AUTO-DETECTION (best-effort, zero config for most users)
    ─────────────────────────────────────────────────────────────────
    Kuberoku needs a container registry to push images. Instead of making
    the user configure one manually, it auto-detects the right registry
    from the cluster and asks once to confirm. Detection is best-effort —
    it may fail if the user lacks cloud CLI permissions or the cluster
    endpoint doesn't map cleanly to a registry. Fallback: ask the user.

    FIRST DEPLOY (registry not yet configured):

        $ kuberoku deploy
        Registry not configured. Detected: EKS (us-east-1)
        Use ECR (123456789.dkr.ecr.us-east-1.amazonaws.com)? [Y/n] y

        Saved. You won't be asked again.
        Building from commit abc1234 (main)...
        Pushing to 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:abc1234...
        Deploying... done
        Release v1 created

    SUBSEQUENT DEPLOYS (registry already saved):

        $ kuberoku deploy
        Building from commit def5678 (main)...
        Deploying... done
        Release v2 created

    How auto-detection works — read the cluster endpoint from kubeconfig:

        Cloud       Detection                           Registry (auto)
        ─────────   ─────────────────────────────────   ──────────────────────────────────
        AWS EKS     endpoint: *.eks.amazonaws.com       {account}.dkr.ecr.{region}.amazonaws.com
        GCP GKE     endpoint: *.gke.goog                {region}-docker.pkg.dev/{project}/kuberoku
        Azure AKS   endpoint: *.azmk8s.io               {name}.azurecr.io (AKS-linked ACR)
        Local       endpoint: localhost / 127.0.0.1     No registry — direct image load

    The user already has cloud credentials on their machine (that's how
    kubectl works). Kuberoku piggybacks on those same credentials to
    discover and authenticate to the registry. No extra setup.

    For local clusters (k3d, kind, minikube, Docker Desktop), no registry
    is needed at all. Images are loaded directly into the cluster:
        k3d:      k3d image import {image}
        kind:     kind load docker-image {image}
        minikube: minikube image load {image}

    RULES:
        1. KUBEROKU_REGISTRY already set?  → use it, no questions
        2. --image flag?                   → skip build entirely, no registry needed
        3. Local cluster detected?         → direct load, no registry needed
        4. Cloud cluster detected?         → auto-detect, confirm once, save
        5. Can't detect?                   → ask for registry URL once, save

    SETTING THE REGISTRY (two scopes — don't confuse them):

        # GLOBAL (all apps on this cluster) — cluster config
        kuberoku clusters:config production --registry my-registry.io/team
        # Saved in ~/.kuberoku/config.yaml under the cluster
        # NOTE: clusters:config is a future addition. Currently, set
        # KUBEROKU_REGISTRY env var or edit ~/.kuberoku/config.yaml directly.

        # ENV VAR (CI/CD) — overrides everything
        KUBEROKU_REGISTRY=my-registry.io/team kuberoku deploy

    NOTE: `kuberoku config:set KUBEROKU_REGISTRY=...` is WRONG — that
    sets an app env var that gets injected into your container, NOT the
    registry Kuberoku uses for pushing images. Don't confuse app config
    (container env vars) with tool config (how Kuberoku operates).

    Auto-detection is convenience only. Production users SHOULD set
    KUBEROKU_REGISTRY explicitly (via env var or cluster config) to avoid
    surprises from auto-detection heuristics.

    The registry URL is saved in cluster config. Set once per cluster,
    applies to all apps on that cluster.

    UNCOMMITTED CHANGES WARNING:
        $ kuberoku deploy
        Building from commit abc1234 (main)...
          ⚠ 3 uncommitted files will NOT be included.
        Building image myapp:abc1234...
        Deploying myapp:abc1234 to all process types...
          Updating web... done
        Release v5 created

        Warn but proceed — sometimes you're deploying committed code while
        having local WIP on something else. The warning is enough.

    USING --ref:
        $ kuberoku deploy --ref v1.2.0
        Building from commit e4f7a21 (tag: v1.2.0)...
        Building image myapp:e4f7a21...
        Deploying myapp:e4f7a21 to all process types...
          Updating web... done
        Release v6 created

    PREREQUISITES:
        - Git repository (must be inside a git repo)
        - Docker installed and running (for docker build)
        - Dockerfile in the repo (committed, at repo root or --dockerfile path)
        - For remote clusters: cloud CLI configured (already true if kubectl works)
        - For local clusters: nothing else (direct image load)
        Note: registry is auto-detected — user does NOT need to configure it manually.

    If Docker is not installed:
        $ kuberoku deploy
        Error: Docker not found. Either:
          1. Install Docker: https://docs.docker.com/get-docker/
          2. Build in CI and use: kuberoku deploy --image <pre-built-image>

    If not in a git repo:
        $ kuberoku deploy
        Error: Not a git repository. Either:
          1. Run from inside a git repo
          2. Use: kuberoku deploy --image <pre-built-image>

    PRE-BUILT IMAGE (--image, for CI/CD):
    ─────────────────────────────────────
    When --image is passed, Kuberoku skips all build/git logic and deploys
    the image directly. This is the path for CI/CD pipelines:

        $ kuberoku deploy --image registry.io/myapp:v3
        Deploying registry.io/myapp:v3 to all process types...
          Updating web... done
        Release v7 created

    No Docker, no git, no registry config needed. Just the image name.

    CI/CD INTEGRATION EXAMPLES:
        # In CI there's no .kuberoku file — pass --app or set KUBEROKU_APP.

        # GitHub Actions — copy-paste ready
        env:
          KUBEROKU_APP: myapp
          KUBEROKU_CLUSTER: production
        on:
          push:
            branches: [main]
        jobs:
          deploy:
            runs-on: ubuntu-latest
            steps:
              - uses: actions/checkout@v4
              - run: docker build -t registry.io/myapp:${{ github.sha }} .
              - run: docker push registry.io/myapp:${{ github.sha }}
              - run: pip install kuberoku
              - run: kuberoku deploy --image registry.io/myapp:${{ github.sha }}

        # GitLab CI — copy-paste ready
        deploy:
          stage: deploy
          variables:
            KUBEROKU_APP: myapp
            KUBEROKU_CLUSTER: production
          script:
            - docker build -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
            - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
            - pip install kuberoku
            - kuberoku deploy --image $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA

    WHY NO IN-CLUSTER BUILDS (NO PLUGIN):
        Kuberoku wraps `docker build` locally for convenience, but it does
        NOT do in-cluster builds. Local Docker + CI are solved problems.
        In-cluster builds (BuildKit, Kaniko, Tekton) would:
          - Require server-side components (violates zero footprint)
          - Need constant maintenance (security patches, K8s API changes)
          - Create security risks (arbitrary Dockerfiles = arbitrary code)
          - Solve a problem CI/CD already solves better
        If you want automated builds on git push → use CI/CD + `--image`.

    CORE BEHAVIOR: UPDATE vs CREATE
    ────────────────────────────────
    Deploy is idempotent. It does the right thing whether resources exist or not.

    FIRST DEPLOY (nothing exists yet):
        $ kuberoku deploy
        Building from commit abc1234 (main)...
        Building image myapp:abc1234...
        Deploying myapp:abc1234 to all process types...
          Creating web... done
        Release v1 created

    SUBSEQUENT DEPLOYS (resources already exist):
        $ kuberoku deploy
        Building from commit def5678 (main)...
        Deploying myapp:def5678 to all process types...
          Updating web... done
          Updating worker... done
        Release v8 created

    The rule: if a Deployment for a process type already exists, PATCH its
    image. If it doesn't exist, CREATE it.

    DEFAULT: DEPLOY UPDATES ALL PROCESS TYPES (Heroku-style)
    ─────────────────────────────────────────────────────────
    This is how Heroku works: one slug, all dynos update. Your web, worker,
    and service processes typically all run from the SAME codebase image,
    just with different commands. A deploy should update them all.

        $ kuberoku deploy
        Building from commit abc1234 (main)...
        Deploying myapp:abc1234 to all process types...
          Updating web... done
          Updating worker... done
        Release v7 created

    What gets updated:
        - ALL app Deployments (web, worker, service): image patched to new image
        - If Procfile present in git archive: container commands updated
        - Release ConfigMap: records ALL images + formation snapshot (replicas + commands)
        - Formation: commands may change (from Procfile), replicas carried forward unchanged

    What NEVER gets touched by deploy:
        - Addons (postgres, redis) — these have their own images
          and lifecycle. Use `addons:scale` or `addons:migrate` for those.
        - Replicas — deploy doesn't change scale. Use `ps:scale` for that.

    SINGLE PROCESS TYPE (escape hatch):
        $ kuberoku deploy --image myapi-worker:v3 --type worker
        # Only updates worker Deployment, leaves web untouched
        # Release still records ALL images (web keeps its current image)

    This is useful when:
        - Worker has a genuinely different image than web
        - Canary deploys (update one process type first)
        - First-time addition of a new process type

    ADDING A NEW PROCESS TYPE:
        $ kuberoku deploy --type worker
        # App already has web, this ADDS worker
        # Creates: Deployment for worker (no Service — workers don't need one)

        $ kuberoku deploy --image myapp:v3 --type smtp --port 25/tcp --port 587/tcp
        # Adds an smtp process type with specific ports (same image as other types)
        # Creates: Deployment + Service (ClusterIP with specified ports)
        # To make it public: `kuberoku services:expose:on smtp`

    K8s mapping:
        Build-from-git        → git archive + docker build + push to auto-detected
                                 registry (or direct load for local clusters)
        First deploy (web)    → CREATE Deployment + Service (ClusterIP, port 8080)
                                 + auto-domain Ingress if base_domain configured
                                 Otherwise: hint to use `domains:add`
        First deploy (worker) → CREATE Deployment only (no Service)
        First deploy (with ports) → CREATE Deployment + Service (ClusterIP, specified ports)
                                    + hint to use `services:expose:on` or `domains:add`
        Subsequent deploy     → PATCH Deployment(s) image field
        Procfile present      → PATCH Deployment(s) container command field
                                 + UPDATE formation ConfigMap with new commands
        All deploys           → CREATE Release ConfigMap {PREFIX}-release-{app}-v{N}

    DEPLOY PROGRESS OUTPUT
    ──────────────────────
    --wait is true by default. The CLI prints clean, real-time progress:

        $ kuberoku deploy
        Building from commit abc1234 (main)...
        Building image myapp:abc1234...
          Step 1/5 : FROM python:3.11-slim
          Step 2/5 : COPY . /app
          ...
        Pushing myapp:abc1234...
        Deploying myapp:abc1234 to all process types...
          Updating web... done
          Updating worker... done
        Waiting for rollout...
          web.1: starting
          web.1: up (running for 2s)
          web.2: up (running for 1s)
          worker.1: starting
          worker.1: up (running for 1s)
        Release v7 created

    POST-DEPLOY HINTS (shown after successful first deploy):

    With base_domain configured (web process detected):
        Release v1 created
        App URL: https://myapi.apps.mycluster.com

    Without base_domain (first deploy only):
        Release v1 created
        App is running (internal only).
        To expose:  kuberoku services:expose:on web
        Or add a domain:  kuberoku domains:add myapi.yourdomain.com

    Non-HTTP ports (TCP/UDP):
        Release v1 created
        To expose: kuberoku services:expose:on smtp

    Subsequent deploys — no hint, just the release:
        Release v7 created

    If rollout fails:
        $ kuberoku deploy
        Building from commit abc1234 (main)...
        Building image myapp:abc1234...
        Deploying myapp:abc1234 to all process types...
          Updating web... done
        Waiting for rollout...
          web.1: starting
          web.1: crashed (ImagePullBackOff)
        Error: Rollout failed — web.1 crashed.
        Run `kuberoku releases:rollback` to revert.

    With --no-wait:
        $ kuberoku deploy --no-wait
        Building from commit abc1234 (main)...
        Deploying myapp:abc1234 to all process types... done
        Release v7 created (rollout in progress)


4.5  RELEASES
─────────────
Heroku equivalent: heroku releases:*
K8s resources: ConfigMaps (release history as JSON)

    kuberoku releases                      List releases (newest first)
        --num INT                          Number of releases to show (default: 10)

    kuberoku releases:info VERSION         Show release details
                                           (image, env changes, timestamp, user)

    kuberoku releases:rollback [VERSION]   Roll back to a previous release
                                           Default: previous release (v{current-1})

    kuberoku releases:prune                Prune old releases
        --keep INT                         Keep this many (default: 50)
        --older-than TEXT                  Delete older than (e.g., "30d")

    IMPORTANT — Rollback semantics (Heroku-style):
        Rollback does NOT rewind history. It creates a NEW release with an
        incremented version number that happens to use the old image/config.

        Example:
            v1  Deploy myapi:abc     (current)
            v2  Deploy myapi:def
            v3  Deploy myapi:ghi
            $ kuberoku releases:rollback v1
            v4  Rollback to v1 (Deploy myapi:abc)   ← NEW release, not rewind

        This means the release log is append-only. You can always see that a
        rollback happened, when, and by whom.

    K8s mapping:
        releases          → List ConfigMaps {PREFIX}-release-{app}-v* (one per release)
        releases:info     → Read specific release ConfigMap
        releases:rollback → Read image+config from target release ConfigMap,
                            create NEW Deployment update with that image,
                            create NEW release ConfigMap with incremented version
                            (description: "Rollback to v{N}")


4.6  LOGS (services:logs)
────────────────────────
Heroku equivalent: heroku logs
K8s resources: Pod logs (streaming)
NOTE: Logs are accessed via `services:logs` to match the services command group.

    kuberoku services:logs                 Show recent logs
        --tail, -t                         Stream logs in real time (follow mode)
        --num INT, -n INT                  Number of lines (default: 100)
        --type TEXT                        Filter by process type (web, worker, etc.)
        --dyno TEXT                        Filter by specific dyno (e.g., web.1)
        --since TEXT                       Show logs since duration (e.g., "5m", "1h")
        --timestamps                       Prefix each line with K8s-provided timestamp
        --previous                         Show logs from previous container instance
                                           (useful for crash debugging)

    Output prefix format:
        {dyno} | {message}
        web.1 | Listening on :8080
        web.2 | Listening on :8080
        worker.1 | Processing job #42

    With --timestamps:
        2025-02-09T15:30:01+00:00 web.1 | Listening on :8080

    IMPORTANT — Log streaming is BEST-EFFORT:
        - Ordering is NOT guaranteed across pods. Each pod's log stream is
          ordered, but interleaving across pods is arrival-order, not
          timestamp-order. This is a fundamental K8s limitation.
        - Timestamps may be missing unless the app logs include them.
          Use --timestamps to get K8s-injected timestamps (from kubelet).
        - Pod restarts cause log gaps. When a pod dies and a new one starts,
          there is a brief window where no logs are emitted. Use --previous
          to see the dead container's last logs for crash debugging.
        - New pods (from scaling up) are auto-discovered and added to the
          stream. Terminated pods are removed gracefully.

    Implementation:
        - One thread per pod, each calling stream_pod_logs(follow=True)
        - Lines are prefixed with dyno name and merged into a single output
        - Pod list is polled periodically to discover new pods
          and remove terminated ones

    K8s mapping:
        services:logs            → stream_pod_logs() across all Pods matching app label
        services:logs --tail     → follow=True, multiplexed across Pods
        services:logs --since 5m → sinceSeconds=300
        services:logs --previous → previous=True (shows crashed container logs)


4.7  EXEC (services:exec)
─────────────────────────
Heroku equivalent: heroku run
NOTE: Semantics differ from Heroku. Heroku `run` spawns a NEW one-off dyno.
Kuberoku `services:exec` (interactive mode) execs into an EXISTING pod for
instant startup. Detached mode (`--detach`) spawns a Job, closer to Heroku's
behavior.

    kuberoku services:exec COMMAND [ARGS...]  Exec into a process pod
        --type TEXT                        Process type to use (default: web)
        --no-tty                           Don't allocate a TTY
        --detach                           Run in background (uses Job)
        --env KEY=VAL                      Extra env vars (repeatable)
        --timeout INT                      Max runtime in seconds (default: 3600)
        --dyno TEXT                        Specific dyno to exec into (e.g., web.1)

    Examples:
        kuberoku services:exec bash                       # exec into web.1
        kuberoku services:exec python manage.py migrate   # exec into web.1
        kuberoku services:exec --dyno worker.2 bash       # exec into specific dyno
        kuberoku services:exec --detach rake db:backup    # background Job

    TWO MODES OF OPERATION:

    1. INTERACTIVE (default) — exec into existing pod
       This is the Heroku experience. `services:exec bash` feels like
       "open a shell in my app", not "spawn a new container and wait".

       Implementation:
           a. Find a running pod for --type (default: web)
              - Pick web.1 by default, or --dyno to choose
           b. exec_in_pod() with TTY
           c. Streams stdin/stdout/stderr with TTY
           d. Exit code from the command is returned

       Why exec, not Job:
           - Instant — no waiting for pod scheduling
           - Access to the same filesystem, env vars, mounted volumes
           - Feels like `heroku run bash` (sub-second startup)
           - No cleanup needed

    2. DETACHED (--detach) — spawns a Job
       For long-running background tasks that should survive disconnection.

       Implementation:
           a. Create a Job with the app's image + env from ConfigMap/Secret
           b. Print the job name and return immediately
           c. User can check with: kuberoku ps (shows one-off dynos)
           d. Job has activeDeadlineSeconds for timeout

    K8s mapping:
        services:exec (interactive) → Find pod by label + exec_in_pod with TTY
        services:exec --detach      → build_job() + create_job(), return job name


4.8  ADDONS
───────────
Heroku equivalent: heroku addons:*
K8s resources: StatefulSet + PVC + Service + Secret + ConfigMap + NetworkPolicy

    WARNING — STATEFUL WORKLOADS
    Addons create stateful services (databases, caches). This is fundamentally
    dangerous without proper storage handling. Kuberoku treats this seriously.

    COMMANDS
    ────────
    kuberoku addons                        List addons for current app
    kuberoku addons:create TYPE            Attach an addon
        --as NAME                          Instance name (default: same as TYPE)
                                           Required when creating a second instance
                                           of the same type.
        --cpu TEXT                         CPU request=limit (default: type-specific)
        --memory TEXT                      Memory request=limit (default: type-specific)
        --storage TEXT                     PVC size (default: type-specific, e.g., 1Gi)
        --storage-class TEXT               K8s StorageClass (default: cluster default)
        --image TEXT                       Override addon image (e.g., postgres:17)
        --ephemeral                        No PVC — data lost on restart (caches only)
        --env-key TEXT                     Override injected env var name
        --external                         No K8s resources — inject URL only
        --url TEXT                         Connection URL (required with --external)
        --no-wait                          Don't wait for addon to be ready

    kuberoku addons:destroy NAME           Remove an addon by instance name
        --confirm TEXT                     Required confirmation
        --delete-data                      Also delete the PVC (PERMANENT DATA LOSS)
                                           DEFAULT: PVC is KEPT. No --keep-data flag
                                           exists — keeping data IS the default.

    kuberoku addons:info NAME              Show addon details, status, events
    kuberoku addons:scale NAME             Change addon resources
        --cpu TEXT                         New CPU request=limit
        --memory TEXT                      New memory request=limit
        --storage TEXT                     Grow PVC (requires expandable StorageClass)

    kuberoku addons:exec NAME               Execute a command in the addon container
                                           (default: psql for postgres, redis-cli for redis,
                                           /bin/sh fallback if no default command)

    kuberoku addons:backup NAME            Backup addon data
        --to FILE                          Output file (default: ./backup-{name}-{ts}.sql)

    kuberoku addons:credentials NAME       Show connection info
        --reveal                           Show actual password (default: masked)

    kuberoku addons:credentials:rotate NAME  Generate new password
                                           Updates Secret, app config, rolling restart

    kuberoku addons:expose:on NAME            Expose an addon externally
        --method TEXT                      loadbalancer (default), nodeport
        --app TEXT / -a TEXT               App (standard resolution)
        ⚠ Exposes database to the internet. Use with caution.

    kuberoku addons:expose:off NAME          Revert an addon to internal-only
        --app TEXT / -a TEXT               App (standard resolution)

    kuberoku addons:connect NAME           Port-forward to an addon locally
        --port INT                         Local port override (auto-assigned)
        --app TEXT / -a TEXT               App (standard resolution)
        Temporary connection. Dies on Ctrl+C. For local debugging
        (psql, redis-cli, etc.)

    NO PLANS — DIRECT RESOURCES
    ───────────────────────────
    Heroku uses plans (dev/standard/premium) because plans bundle billing.
    We're not a SaaS — there's no billing. Users control their cluster.

    Direct CPU/memory/storage, same model as ps:type:
        addons:create postgres                                        # sensible defaults
        addons:create postgres --cpu 500m --memory 1Gi --storage 10Gi # custom
        addons:scale postgres --memory 4Gi                             # change one thing

    Defaults per addon type:
        Type        CPU     Memory    Storage    Notes
        ─────────   ─────   ────────  ─────────  ─────────────────────────
        postgres    250m    256Mi     1Gi        Query engine needs RAM
        redis       100m    128Mi     —          In-memory, ephemeral default
        mysql       250m    256Mi     1Gi        Similar to postgres
        mongodb     250m    512Mi     1Gi        Bigger working set

    GUARANTEED QoS (not burstable):
    Unlike app processes (where burstable is fine), addons default to
    limits = requests. OOMKilled postgres mid-write = data corruption.
        --cpu 250m   → requests: 250m, limits: 250m  (guaranteed)
        --cpu 250m/500m → requests: 250m, limits: 500m (user override, burstable)

    MULTI-INSTANCE ADDONS
    ─────────────────────
    An app can have MULTIPLE instances of the same addon type. For example,
    two postgres databases (main + analytics) or three redis instances
    (cache + sessions + queues).

    First instance — type name IS the instance name:
        kuberoku addons:create postgres
        # Instance name: "postgres" (default, same as type)
        # Injects: DATABASE_URL=postgres://{PREFIX}-addon-{app}-postgres:5432/app

    Additional instances — MUST use --as:
        kuberoku addons:create postgres --as analytics-db
        # Instance name: "analytics-db"
        # Injects: ANALYTICS_DB_URL=postgres://{PREFIX}-addon-{app}-analytics-db:5432/app

        kuberoku addons:create redis --as sessions-cache
        # Instance name: "sessions-cache"
        # Injects: SESSIONS_CACHE_URL=redis://{PREFIX}-addon-{app}-sessions-cache:6379

    Without --as, second create of same type fails:
        kuberoku addons:create postgres
        Error: Addon "postgres" already exists for app "myapi".
        Use --as NAME to create a second instance.

    ENV VAR NAMING CONVENTION:
        Instance name        Env var injected          Convention
        ──────────────────   ──────────────────────    ──────────────────────
        postgres             DATABASE_URL              First postgres = standard name
        redis                REDIS_URL                 First redis = standard name
        analytics-db         ANALYTICS_DB_URL          {NAME uppercased, - → _}_URL
        sessions-cache       SESSIONS_CACHE_URL        {NAME uppercased, - → _}_URL
        my-redis             MY_REDIS_URL              {NAME uppercased, - → _}_URL
        (any)                --env-key CUSTOM_NAME     User override

    All addon commands use INSTANCE NAME (not type) as the argument:
        kuberoku addons:info analytics-db
        kuberoku addons:exec analytics-db
        kuberoku addons:backup analytics-db --to dump.sql
        kuberoku addons:destroy analytics-db

    List output shows all instances:
        $ kuberoku addons
        === myapi addons
        Name           Type      Status    CPU    Memory  Storage  Env Key
        ─────────────  ────────  ────────  ─────  ──────  ───────  ──────────────
        postgres       postgres  running   250m   256Mi   1Gi      DATABASE_URL
        analytics-db   postgres  running   500m   1Gi     10Gi     ANALYTICS_DB_URL
        redis          redis     running   100m   128Mi   —        REDIS_URL
        sessions-cache redis     running   100m   128Mi   —        SESSIONS_CACHE_URL

    Built-in addon types (Phase 7):
        postgres     PostgreSQL database
        redis        Redis key-value store

    ADDON LIFECYCLE STATE MACHINE
    ─────────────────────────────
    Status stored in addon metadata ConfigMap:

        creating → running → upgrading → running
            ↓         ↓                     ↓
          failed   backing-up → running   failed

    Operations check status before proceeding:
        - Can't backup a "creating" addon
        - Can't upgrade a "backing-up" addon

    ADDON CREATION FLOW (10 steps)
    ──────────────────────────────
        Step 1:  Create metadata ConfigMap (status: "creating")
        Step 2:  Generate credentials (secrets.token_urlsafe)
        Step 3:  Create K8s Secret (username, password, database)
        Step 4:  Create PVC (if not ephemeral; reattach if existing PVC found)
        Step 5:  Create StatefulSet (mounts PVC, sets container env from Secret)
        Step 6:  Wait for pod ready (readiness probe passes)
        Step 7:  Create Service (ClusterIP, addon port)
        Step 8:  Create addon-allow NetworkPolicy (only app pods → addon port)
        Step 9:  Inject connection URL into app config (config:set)
        Step 10: Update metadata ConfigMap (status: "running")

    If any step fails: update metadata to status: "failed" with error.
    Don't clean up — user can debug via addons:info or destroy.

    STORAGE STRATEGY
    ────────────────
    Stateful addons (postgres, redis) use:
        - StatefulSet (NOT Deployment) — stable pod identity, stable PVC binding
        - PersistentVolumeClaim — data survives pod restart, node failure, drain
        - StorageClass — uses cluster default, overridable via --storage-class

    Two storage categories:
        Always persistent:  postgres, mysql, mongodb — error if --ephemeral
        User chooses:       redis — persistent by default, --ephemeral to skip PVC

    StorageClass validation:
        - If no default StorageClass and no --storage-class: error with available list
        - Storage grow (addons:scale --storage 10Gi): requires allowVolumeExpansion
        - Doctor checks: "Default StorageClass exists" as a diagnostic

    PVC REATTACHMENT:
    If an addon was destroyed (PVC retained) and recreated:
        Found existing PVC kuberoku-addon-pvc-postgres-myapp (420Mi data)
        Reattach existing data? [y/N]
    If yes: StatefulSet binds to existing PVC, data preserved.
    If no: delete old PVC, create fresh.

    K8s RESOURCES PER ADDON INSTANCE
    ────────────────────────────────
    For `addons:create postgres` (instance name = "postgres"):
        1. ConfigMap:      {PREFIX}-addon-{app}-{instance} (metadata: type, status, etc.)
        2. Secret:         {PREFIX}-addon-secret-{app}-{instance} (credentials)
        3. PVC:            {PREFIX}-addon-pvc-{app}-{instance} (persistent storage)
        4. StatefulSet:    {PREFIX}-addon-{app}-{instance} (1 replica, mounts PVC)
        5. Service:        {PREFIX}-addon-{app}-{instance} (ClusterIP, addon port)
        6. NetworkPolicy:  {PREFIX}-netpol-addon-{app}-{instance} (addon-allow)

    Labels on ALL addon resources:
        {DOMAIN}/managed-by: {PREFIX}
        {DOMAIN}/app: {app}
        {DOMAIN}/resource-type: addon
        {DOMAIN}/addon-type: {type}           (postgres, redis, etc.)
        {DOMAIN}/addon-instance: {instance}   (postgres, analytics-db, etc.)

    External addons (--external) create only ConfigMap (#1) + config injection.

    SECURITY MODEL — THREE LAYERS
    ──────────────────────────────
    Layer 1: NETWORK ISOLATION (addon-allow NetworkPolicy)
        Only pods belonging to the owning app can reach the addon port.
        Other apps blocked by deny-all policy (created on apps:create).

        spec:
          podSelector:
            matchLabels:
              {DOMAIN}/addon-instance: postgres
              {DOMAIN}/app: myapp
          ingress:
          - from:
            - podSelector:
                matchLabels:
                  {DOMAIN}/app: myapp    # ONLY this app's pods
            ports:
            - port: 5432
              protocol: TCP

    Layer 2: AUTHENTICATION (generated credentials)
        Random password per instance (secrets.token_urlsafe(32)).
        Stored in K8s Secret. Injected into app config. Never shown in
        CLI output (masked ●●●●●●). Rotatable via addons:credentials:rotate.

    Layer 3: NO EXTERNAL ACCESS
        ClusterIP only — no LoadBalancer, no NodePort, no external IP.
        Debug via addons:exec (exec) or addons:connect (port-forward).
        Deliberate exposure via addons:expose:on (with security warnings).

    CREDENTIAL ROTATION
    ───────────────────
        addons:credentials:rotate postgres --app myapp
        # 1. Generate new password
        # 2. Update database user password (ALTER USER via exec)
        # 3. Update K8s Secret
        # 4. Update app config (DATABASE_URL with new password)
        # 5. Rolling restart of ALL app process types
        # Zero downtime — old connections drain, new ones use new password

    BACKUP
    ──────
    Simple exec + pipe approach. No Jobs, no S3. Works everywhere.

    On-demand backup:
        addons:backup postgres --app myapp
        # Exec into pod → run pg_dump → stream to local file
        # Output: Backup saved to ./backup-postgres-myapp-20250210-153000.sql (12 MB)

    Per addon type:
        Type        Command          Format    Notes
        ─────────   ──────────       ───────   ───────────────────
        postgres    pg_dump          SQL       Full database dump
        mysql       mysqldump        SQL       Full database dump
        mongodb     mongodump        BSON      Binary dump
        redis       BGSAVE + cp      RDB       Point-in-time snapshot

    EXEC INTO ADDON (addons:exec)
    ─────────────────────────────
    Execute a command inside the addon container. Zero local setup.

        addons:exec postgres    → psql -U kuberoku app
        addons:exec redis       → redis-cli -a <password>
        addons:exec mysql       → mysql -u kuberoku -p app
        addons:exec mongodb     → mongosh -u kuberoku app

    The addon container already ships its own tools.
    Each AddonDef has exec_command field; None falls back to /bin/sh.

    VERSION MIGRATION (all version changes — minor and major)
    ──────────────────────────────────────────────────────────
    kuberoku addons:migrate NAME           Migrate addon to a new major version
        --version TEXT                     Target version (required)
        --force                            Bypass migration path validation
        --app TEXT / -a TEXT               App (standard resolution)

    kuberoku addons:migrate-rollback NAME  Rollback to previous version
        --app TEXT / -a TEXT               App (standard resolution)

    Strategy per addon type:

    Postgres (any direction — upgrade or downgrade):
        1. pg_dump (backup current data)
        2. Scale StatefulSet to 0, wait for pod gone
        3. Delete PVC (wipe on-disk data)
        4. Swap image to target version, scale to 1
        5. Wait for pod ready (pg_isready)
        6. psql < backup_data (restore via stdin)
        → Data preserved across any version change

    Redis upgrade (7→8):
        1. Scale StatefulSet to 0, wait for pod gone
        2. Swap image, scale to 1 (no PVC wipe — forward compatible)
        → Data preserved (RDB format backward compatible)

    Redis downgrade (8→7):
        1. Scale StatefulSet to 0, wait for pod gone
        2. Delete PVC (RDB v12 incompatible with Redis 7)
        3. Swap image, scale to 1
        → Data lost (no backup_command for Redis). Metadata: data_wiped=true

    Migration path validation:
        Postgres: sequential only (15→16, 16→17, 17→16, 16→15)
        Redis: any-to-any (migration_paths=None)
        --force: bypass path validation (e.g. 15→17 direct)

    Examples:
        addons:migrate postgres --version 17   # 16→17 with backup+restore
        addons:migrate redis --version 8       # 7→8 simple swap
        addons:migrate-rollback postgres       # rollback to previous version

    ADDON INFO AND DIAGNOSTICS
    ──────────────────────────
        $ kuberoku addons:info postgres --app myapp
        === postgres (postgres:16) ===
        Status:     Running
        Pod:        kuberoku-addon-myapp-postgres-0 (Running, 0 restarts)
        CPU:        250m/250m (guaranteed)
        Memory:     256Mi/256Mi (guaranteed)
        Storage:    1Gi (420Mi used)
        Port:       5432 (internal: kuberoku-addon-myapp-postgres:5432)
        Env:        DATABASE_URL
        Created:    2025-02-09 14:30 UTC

        Recent events:
          (none — healthy)

    On failure:
        Status:     FAILED (2 restarts in last hour)
        Recent events:
          10m ago   OOMKilled — memory limit 256Mi exceeded
          10m ago   Pulled image postgres:16
        → Increase memory: addons:scale postgres --memory 512Mi

    DESTROY SAFETY
    ──────────────
    Default: PVC is KEPT. User must explicitly --delete-data.

        addons:destroy postgres --confirm postgres
        # Removes: StatefulSet, Service, NetworkPolicy, Secret, metadata ConfigMap
        # Removes: DATABASE_URL from app config → rolling restart
        # RETAINS: PVC (data preserved)
        # Output: "PVC kuberoku-addon-pvc-myapp-postgres retained."

        addons:destroy postgres --confirm postgres --delete-data
        # Same as above + deletes PVC (PERMANENT DATA LOSS)

    apps:destroy with addons:
        # Destroys all addon resources. PVCs retained by default.
        # Warning: "2 addon PVCs retained. Delete manually if not needed."

    APP RENAME WITH ADDONS
    ──────────────────────
    Renaming an app with addons is complex (DNS names change, URLs change).
    Default: refuse rename if addons exist.
        Error: App has 2 addons. Remove addons first, or use --force.

    EXTERNAL ADDONS (BRING YOUR OWN)
    ────────────────────────────────
    For managed cloud databases (RDS, CloudSQL, Azure Database):

        addons:create postgres --external \
            --url "postgres://user:pass@mydb.rds.amazonaws.com:5432/prod"

    No StatefulSet, no PVC, no Service. Just:
        1. Store URL in app config (config:set DATABASE_URL=...)
        2. Create metadata ConfigMap (so addons:list shows it)
        3. Rolling restart of all app process types

    Shows in addons list as:
        postgres   postgres  external  —      —       —        DATABASE_URL

    OPERATOR INTEGRATION (TIER 2)
    ─────────────────────────────
    If a K8s operator is installed for an addon type, use it instead of
    a plain StatefulSet. Get HA/replication for free.

        addons:create postgres --app myapp
        # Detected CloudNativePG operator
        # Creating 3-node postgres cluster with streaming replication...

    Without operator:
        # Creating postgres (single instance)...
        # Tip: Install CloudNativePG for HA replication.

    Each AddonDef has optional operator_crd field. If the CRD exists in
    the cluster, create the operator's Custom Resource instead of a
    StatefulSet. Same CLI interface to the user.

    doctor checks: "No database operators found. Addons run single-instance."

    No replica scaling without operator:
        addons:scale postgres --replicas 3
        Error: Scaling requires an operator. Install CloudNativePG for HA.

    ADDON TYPE DEFINITION (SCALABLE ARCHITECTURE)
    ──────────────────────────────────────────────
    Each addon type is a dataclass. Adding a new type = one new file,
    zero CLI changes, zero SDK changes.

        @dataclass(frozen=True)
        class AddonDef:
            type_name: str               # "postgres"
            image: str                   # "postgres:16"
            port: int                    # 5432
            protocol: str                # "TCP"
            default_cpu: str             # "250m"
            default_memory: str          # "256Mi"
            default_storage: str | None  # "1Gi" or None for ephemeral
            needs_storage: bool          # True for postgres, False for ephemeral
            storage_optional: bool       # True for redis (--ephemeral OK)
            needs_auth: bool             # True for postgres/redis
            env_key: str                 # "DATABASE_URL"
            url_template: str            # "postgres://{user}:{password}@{host}:{port}/{db}"
            container_env: Callable      # Generates env vars for the container
            readiness_probe: dict        # K8s probe spec
            exec_command: list[str] | None  # ["psql", "-U", "{user}", "{database}"]
            backup_command: list[str]    # ["pg_dump", "-U", "{user}", "{database}"]
            operator_crd: str | None     # "clusters.postgresql.cnpg.io"
            operator_builder: Callable | None

    Registry:
        addons/__init__.py  — ADDON_REGISTRY dict, register() function
        addons/postgres.py  — POSTGRES = AddonDef(...)
        addons/redis.py     — REDIS = AddonDef(...)

    Plugin addon types (Phase 10) register via entry_points:
        [project.entry-points."kuberoku.addons"]
        elasticsearch = "kuberoku_elasticsearch:ELASTICSEARCH"

    After `pip install kuberoku-elasticsearch`:
        kuberoku addons:create elasticsearch --app myapp
    Works immediately. Same CLI, same SDK. Plugin provides an AddonDef.

    CONFIG INJECTION TIMING
    ───────────────────────
    Scenario A: App deployed, then addon added
        addons:create injects URL via config:set → rolling restart all types
        App pods restart with DATABASE_URL available.

    Scenario B: Addon before first deploy
        addons:create injects URL via config:set → no restart (nothing deployed)
        First deploy picks up DATABASE_URL automatically.

    Scenario C: Addon destroyed while app running
        addons:destroy removes URL via config:unset → rolling restart
        Warning: "Your app will lose access to postgres immediately."

    All three work via existing config:set/config:unset restart logic.

    K8s MAPPING SUMMARY
    ───────────────────
        addons:create TYPE     → Secret + PVC + StatefulSet + Service +
                                 NetworkPolicy (addon-allow) + metadata ConfigMap +
                                 config:set (inject URL) + rolling restart
        addons:destroy NAME    → Delete all except PVC (unless --delete-data)
                                 config:unset (remove URL) + rolling restart
        addons:scale NAME      → Patch StatefulSet resources (cpu/memory/storage)
        addons:migrate NAME    → Version migration (backup → PVC wipe → restore)
        addons:backup NAME     → exec backup command in addon pod → local file
        addons:exec NAME        → exec command in addon container
        addons:credentials:rotate → ALTER USER + update Secret + config:set + restart

    ADDON NETWORKING MODEL
    ──────────────────────
    Addons are INTERNAL-ONLY by default (ClusterIP Service).
    This is a security decision.

    To CONNECT (temporary, debugging):
        kuberoku addons:connect postgres  # port-forward, dies on Ctrl+C
        kuberoku addons:connect analytics # auto-avoids local port conflicts
        See Section 4.8 (addons:connect) for full documentation.

    To EXPOSE permanently (toggle on/off):
        kuberoku addons:expose:on analytics  # LoadBalancer, gets external IP
        kuberoku addons:expose:off analytics # back to ClusterIP
        See Section 4.8 (addons:expose:on/off) for full documentation.

        ⚠ Exposing a database to the internet is a security risk.
        Ensure strong credentials and consider firewall rules.

    For app-to-addon connectivity:
        - The app's Secret has env vars pointing to internal ClusterIP DNS names
        - e.g., DATABASE_URL=postgres://{PREFIX}-addon-{app}-postgres:5432/app
        - e.g., ANALYTICS_DB_URL=postgres://{PREFIX}-addon-{app}-analytics-db:5432/app
        - This is automatic — injected by `addons:create`
        - Works because app and addon are in the same namespace
        - Multiple addons of same type have different instance names → different DNS names
          → different env vars → zero conflicts


4.9  SERVICES / PORTS (NETWORKING)
──────────────────────────────────
Heroku equivalent: (none — Heroku auto-exposes everything)
K8s resources: Service (type patching), Ingress

    Process types use `services` for expose:on/off, open, and port management.
    Addons use `addons:expose:on/off` and `addons:connect` (see Section 4.8).

    No `--addon`/`--process` flags needed — the command group IS the
    disambiguation. `services:expose:on web` = process. `addons:expose:on
    postgres` = addon.

    CORE PRINCIPLE: DEPLOY IS ALWAYS INTERNAL
    ──────────────────────────────────────────
    When you deploy ANY process with ports, it starts as ClusterIP (internal).
    Addons are always internal. Nothing is public until you explicitly say so.

        $ kuberoku deploy --image postfix:latest \
            --port 25/tcp --port 587/tcp
        # App is running. Internal only. ClusterIP Service created.

    ┌─────────────────────────────────────────────────────────────────────┐
    │  SERVICES (process types)                                           │
    │  kuberoku services                    List process types with       │
    │                                       ports and exposure status     │
    │      --app TEXT / -a TEXT             App (standard resolution)     │
    │  kuberoku services:expose:on NAME        Make a process type public    │
    │      --method TEXT                    loadbalancer (default),       │
    │                                       nodeport                      │
    │      --port INT                       Only expose specific port(s)  │
    │      --app TEXT / -a TEXT             App (standard resolution)     │
    │  kuberoku services:expose:off NAME      Revert process to internal    │
    │      --app TEXT / -a TEXT             App (standard resolution)     │
    │  kuberoku services:open [NAME]        Open process in browser       │
    │      --type TEXT                      Process type (default: web)   │
    │      --app TEXT / -a TEXT             App (standard resolution)     │
    │                                       HTTP only. Opens domain URL   │
    │                                       or falls back to port-forward │
    │                                                                     │
    │  ADDONS (see Section 4.8 for full list)                             │
    │  kuberoku addons:expose:on NAME          Expose an addon externally    │
    │  kuberoku addons:expose:off NAME        Revert addon to internal      │
    │  kuberoku addons:connect NAME         Port-forward to addon locally │
    │                                                                     │
    │  PORTS (per-process port management — subgroup of services)          │
    │  kuberoku services:ports              List ports                    │
    │      --app TEXT / -a TEXT             App (standard resolution)     │
    │  kuberoku services:ports:add PORT      Add a port (rolling restart)  │
    │  kuberoku services:ports:remove PORT   Remove a port (rolling restart│
    │      --type TEXT                      Process type (if multi)       │
    │      --app TEXT / -a TEXT             App (standard resolution)     │
    │                                                                     │
    │  LOGS + EXEC (runtime operations — subcommands of services)         │
    │  kuberoku services:logs               View recent process logs      │
    │      --tail, -t                       Stream in real time (follow)  │
    │      --num INT, -n INT                Lines to show (default: 100)  │
    │      --type TEXT                      Filter by process type        │
    │      --dyno TEXT                      Filter by dyno (e.g., web.1) │
    │      --since TEXT                     Duration filter (5m, 1h, 30s) │
    │      --timestamps                     Prefix with K8s timestamps    │
    │      --previous                       Show crashed container logs   │
    │  kuberoku services:exec CMD [ARGS..] Exec into a process pod       │
    │      --type TEXT                      Process type (default: web)   │
    │      --no-tty                         Don't allocate TTY            │
    │      --detach                         Run as background Job         │
    │      --env KEY=VAL                    Extra env vars (repeatable)   │
    │      --timeout INT                    Max seconds (default: 3600)   │
    │      --dyno TEXT                      Specific dyno (e.g., web.1)  │
    └─────────────────────────────────────────────────────────────────────┘

    EXPOSE: WHAT HAPPENS
    ────────────────────
    Expose ONLY patches the Service object's `.spec.type` field.
    Pods are NEVER restarted. No redeploy. Only the network path changes.
    LoadBalancer provisioning can take 30-60s on cloud providers.

    NOTE: services:expose:on/off = network-only, no pod restart.
    services:ports:add/remove = pod spec change, triggers rolling restart.
    Different operations, different blast radius.

        $ kuberoku services:expose:on smtp
        Exposing smtp...
          Patching Service: ClusterIP → LoadBalancer
          Waiting for external IP...
          External IP: 34.123.45.67
        Exposed:
          34.123.45.67:25/tcp
          34.123.45.67:587/tcp

        $ kuberoku services:expose:off smtp
        Unexposing smtp...
          Patching Service: LoadBalancer → ClusterIP
          External IP released.
        Internal only.

        $ kuberoku addons:expose:on analytics
        Exposing addon analytics...
          Patching Service: ClusterIP → LoadBalancer
          External: 34.123.45.68:5432
        ⚠ Database is now accessible from the internet.

        $ kuberoku addons:expose:off analytics
        Internal only.

    CONNECT (ADDONS ONLY): PORT-FORWARD FOR DEBUGGING
    ──────────────────────────────────────────────────
    Temporary local connection. Dies on Ctrl+C. For addon debugging
    (psql, redis-cli, etc.). Auto-assigns local ports to avoid conflicts.

        $ kuberoku addons:connect postgres
        Connecting to postgres...
        localhost:5432 → postgres:5432
        Hint: psql postgres://localhost:5432/app

        $ kuberoku addons:connect analytics
        localhost:5433 → analytics:5432         ← auto-avoids conflict
        Hint: psql postgres://localhost:5433/app

    OPEN (SERVICES ONLY): BROWSER ACCESS
    ─────────────────────────────────────
    Opens HTTP process in browser. If app has a domain → opens that URL.
    If no domain → falls back to port-forward + localhost URL.

        $ kuberoku services:open
        Opening https://myapi.apps.mycluster.com...

        $ kuberoku services:open --type admin
        Opening http://localhost:9090 (port-forwarding)...

    PORTS: MANAGE PORTS WITHOUT REDEPLOYING
    ────────────────────────────────────────
    Add or remove ports after deploy. Updates Service + Deployment.
    Triggers a rolling restart (pods need new containerPort).

    PORT OWNERSHIP: Ports are PER PROCESS TYPE, not per app.
    Each process type has its own Deployment + Service. Ports belong
    to that process type's Service.

        web  → Service kuberoku-myapi-web  → ports: 8080/tcp
        grpc → Service kuberoku-myapi-grpc → ports: 50051/tcp
        smtp → Service kuberoku-myapi-smtp → ports: 25/tcp, 587/tcp

    Two process types CAN have different ports (and usually do).
    Two process types CANNOT have conflicting ports because they
    have separate Services with separate ClusterIPs.

    If app has ONE process type, --type is optional (inferred).
    If app has MULTIPLE process types, --type is REQUIRED:
        kuberoku services:ports:add 50051/tcp --type grpc     ← which Service?
        kuberoku services:expose:on grpc                 ← expose grpc's Service

        $ kuberoku services:ports
        === catchall-smtp
        25/tcp   465/tcp   587/tcp   2525/tcp

        $ kuberoku services:ports:add 993/tcp
        Added 993/tcp. Rolling restart...

        $ kuberoku services:ports:remove 2525/tcp
        Removed 2525/tcp. Rolling restart...

        $ kuberoku services:ports
        25/tcp   465/tcp   587/tcp   993/tcp

    For apps with multiple process types, specify which:
        kuberoku services:ports:add 9090/tcp --type admin

    EXPOSURE METHODS
    ────────────────
    Method          K8s Service Type    Cost     UX           When to Use
    ──────────────  ────────────────    ──────   ───────────  ──────────────────
    loadbalancer    LoadBalancer        $$$      Real IP      Production, cloud
    nodeport        NodePort            Free     Ugly port    Dev, on-prem, budget

    LoadBalancer (default):
        - Cloud provider allocates a real external IP
        - Best UX: stable IP, standard ports (25, 587, etc.)
        - Cost: ~$15-20/mo per LoadBalancer on AWS/GCP/Azure
        - Not available on bare-metal without MetalLB

    NodePort:
        - K8s allocates a high port (30000-32767) on every node
        - Free but ugly: clients connect to node-ip:30587 instead of :587
        - Often blocked by corporate firewalls
        - Always available, works everywhere

    EXPOSE + DOMAINS: INDEPENDENT AND COMPOSABLE
    ──────────────────────────────────────────────
    `services:expose:on` gives you a LoadBalancer IP (direct TCP/UDP access).
    `domains:add` gives you an Ingress rule (HTTP hostname routing).
    They are independent. Use either. Or both on the same service.

        # HTTP only (Ingress):
        kuberoku domains:add api.myapp.com --type web

        # TCP only (LoadBalancer):
        kuberoku services:expose:on smtp

        # BOTH on same service (gRPC via Ingress + direct TCP):
        kuberoku domains:add grpc.myapp.com --type grpc
        kuberoku services:expose:on grpc

    COST-SAVING: SHARED TCP PROXY (ADVANCED, CLUSTER-ADMIN)
    ────────────────────────────────────────────────────────
    For clusters running nginx-ingress-controller, multiple TCP/UDP services
    can share ONE LoadBalancer. This is configured via the nginx TCP services
    ConfigMap — but it requires cluster-admin access.

    Kuberoku does NOT manage this automatically (it's a cluster-level concern).
    Documentation will include a guide for cluster admins who want to set this up.

    FUTURE: Gateway API (V2)
    ────────────────────────
    Gateway API natively supports TCPRoute and UDPRoute, which could replace
    the LoadBalancer-per-service model with shared gateways.

    K8s mapping:
        services:expose:on NAME   → Patch process Service .spec.type to LoadBalancer
                                 or NodePort (no pod restart)
        services:expose:off NAME → Patch process Service .spec.type back to ClusterIP
        services:open [NAME]   → Resolve domain URL or port-forward + open browser
        addons:expose:on NAME     → Patch addon Service .spec.type to LoadBalancer
        addons:expose:off NAME   → Patch addon Service .spec.type back to ClusterIP
        addons:connect NAME    → kubectl port-forward to addon Service
        services:ports:add PORT → Patch Service .spec.ports + Deployment containerPorts
                                 (triggers rolling restart)
        services:ports:remove PORT → Patch Service .spec.ports + Deployment containerPorts
                                    (triggers rolling restart)


4.10  DOMAINS
─────────────
Heroku equivalent: heroku domains:*
K8s resources: Ingress (V1 — NOT Gateway API)

    kuberoku domains                       List custom domains
    kuberoku domains:add DOMAIN            Add custom domain
        --type TEXT                        Process type to route to (default: web)
                                           Works for ANY process type with ports.
                                           E.g., --type admin routes to admin process.
        --port INT                         Which port to route to (default: first HTTP
                                           port, usually 8080)
        --cert TEXT                        TLS cert secret name (manual cert)
        --no-tls                           Skip TLS entirely (HTTP only)
        --issuer TEXT                      cert-manager ClusterIssuer name
                                           (default: letsencrypt-prod)
    kuberoku domains:remove DOMAIN         Remove custom domain
    kuberoku domains:clear                 Remove all custom domains

    DOMAINS WORK FOR ANY PROCESS TYPE (not just web):
        Ingress routes HTTP traffic by hostname. Any process type that serves
        HTTP (or gRPC, WebSocket — anything over HTTP/2) can use domains:add:

            kuberoku domains:add api.myapp.com                    # routes to web:8080 (default)
            kuberoku domains:add admin.myapp.com --type admin     # routes to admin:8080
            kuberoku domains:add grpc.myapp.com --type grpc       # routes to grpc:50051

        If you have one process type with BOTH HTTP and non-HTTP ports, use
        --port to specify which port gets the Ingress routing:

            kuberoku domains:add api.myapp.com --port 8080        # HTTP API
            # Port 50051 (gRPC) is NOT routed via Ingress — use `services:expose:on` for that

    PREREQUISITES (domains is a BEST-EFFORT feature):
        Domains require cluster-level components that Kuberoku does NOT install:

        1. An Ingress controller (nginx-ingress, traefik, etc.)
        2. cert-manager (for auto-TLS) + a configured ClusterIssuer
        3. DNS pointing the domain to the Ingress controller's external IP

        If any prerequisite is missing, `domains:add` will:
        - Check for Ingress controller: if missing → fail with error:
            Error: No Ingress controller detected in the cluster.
            Install one: https://kubernetes.github.io/ingress-nginx/deploy/
        - Check for cert-manager (if TLS requested): if missing → fail:
            Error: cert-manager not found. Auto-TLS is unavailable.
            Install: https://cert-manager.io/docs/installation/
            Or use --no-tls to skip, or --cert to provide your own.
        - If both exist → create Ingress + annotation, succeed.

        `kuberoku doctor` also checks for these and reports their status.

    V1 decision: Ingress API (networking.k8s.io/v1), NOT Gateway API.
        Gateway API is newer and more powerful but has less adoption.
        V2 may add Gateway API support behind a flag.

    AUTO-GENERATED DOMAINS (BASE_DOMAIN)
    ─────────────────────────────────────
    Like Heroku's myapp.herokuapp.com — every web app gets a URL instantly,
    zero config per app.

    Setup (one-time per cluster):
        # In ~/.kuberoku/config.yaml
        clusters:
          production:
            context: prod-eks
            namespace: kuberoku
            base_domain: apps.mycluster.com     # ← add this

    Prerequisites for base_domain:
        1. Wildcard DNS:  *.apps.mycluster.com → Ingress controller IP
        2. Wildcard cert: cert-manager with DNS-01 challenge solver,
           or a pre-provisioned wildcard cert (*.apps.mycluster.com)

    Behavior:
        On first web deploy, if base_domain is set, Kuberoku auto-creates
        an Ingress rule for {app}.{base_domain}:

            $ kuberoku deploy
            Building from commit abc1234 (main)...
            Deploying... done
            Release v1 created
            App URL: https://myapi.apps.mycluster.com  ← auto-generated

        The auto-generated domain coexists with custom domains:
            $ kuberoku domains
            === myapi domains
            myapi.apps.mycluster.com   (auto, TLS)     ← generated
            api.mycompany.com          (custom, TLS)    ← via domains:add

        Auto-generated domains are:
            - Created on first web deploy (not on apps:create)
            - Never overwritten by subsequent deploys
            - Removed on apps:destroy
            - NOT removable via domains:remove (use domains:remove for custom only)

    Without base_domain configured:
        No auto-generated domain. User must manually `domains:add`.
        Deploy output just says "App running. Use domains:add for external access."

    K8s mapping:
        domains:add    → Ingress rule for domain
                         + annotation: cert-manager.io/cluster-issuer: {issuer}
                         + tls section with secretName: {PREFIX}-tls-{domain}
        domains:remove → Remove Ingress rule + tls entry (custom domains only)
        domains:clear  → Delete the entire Ingress resource (including auto-generated)
        auto-domain    → Same as domains:add but triggered by first web deploy,
                         uses wildcard cert secret: {PREFIX}-tls-wildcard


4.11  MAINTENANCE
─────────────────
Heroku equivalent: heroku maintenance:*
K8s resources: Deployment (scale) + annotations on app manifest ConfigMap

    kuberoku apps:maintenance              Show maintenance status
    kuberoku apps:maintenance:on           Enable maintenance mode (all processes)
                                           Saves current replicas, scales all to 0
    kuberoku apps:maintenance:off          Disable maintenance mode (all processes)
                                           Restores saved replica counts
    kuberoku services:maintenance:on TYPE  Enable maintenance for one process type
                                           Saves replicas for TYPE, scales to 0
    kuberoku services:maintenance:off TYPE Disable maintenance for one process type
                                           Restores saved replicas for TYPE

    Storage: annotations on the app manifest ConfigMap:
        {DOMAIN}/maintenance           "true" / absent
        {DOMAIN}/maintenance-since     ISO 8601 timestamp
        {DOMAIN}/maintenance-saved     JSON: {"web": 3, "worker": 2}
        {DOMAIN}/maintenance-services  JSON: ["smtp"] (per-service maintenance)

    Mechanism: scale Deployment replicas to 0 (not maintenance page pods).
        HTTP processes → Ingress returns 503 (no backends)
        Non-HTTP processes → connection refused (no pods)

    K8s mapping:
        apps:maintenance:on        → Save all replica counts in annotation,
                                     scale ALL Deployments to 0
        apps:maintenance:off       → Read saved replicas, restore all Deployments
        services:maintenance:on    → Save one process type's replicas,
                                     scale that Deployment to 0
        services:maintenance:off   → Restore one process type's replicas


4.12  STATUS (SINGLE-GLANCE OVERVIEW)
─────────────────────────────────────
Heroku equivalent: heroku apps:info (but richer)
K8s resources: reads all app resources

    kuberoku apps:status                   Show full app status at a glance

    Output:
    === myapi (production)
    Release:      v7 (Deploy myapi:abc123) — 2h ago by user@host
    Web:          3 dynos (3/3 up)
    Worker:       2 dynos (2/2 up)
    Domains:      myapi.apps.mycluster.com (auto, TLS)
                  api.mycompany.com (custom, TLS)
    Addons:       postgres (postgresql 16, dev, running)
                  analytics-db (postgresql 16, standard, running)
                  redis (redis 7, dev, running)
    Maintenance:  off
    Last deploy:  2h ago

    This is the "dashboard in your terminal" command. Shows everything
    relevant without separate commands for each piece.

    K8s mapping:
        apps:status → Read app manifest CM + formation CM + latest release CM +
                      list Pods + list Ingresses + list addon resources


4.13  CLUSTERS
──────────────
Heroku equivalent: (none — Heroku is single-platform)
K8s resources: (manages kubeconfig contexts)

    kuberoku clusters                      List configured clusters
    kuberoku clusters:add NAME             Register a cluster
        --context TEXT                     kubeconfig context name (required)
        --namespace TEXT                   K8s namespace for this cluster
        --default                          Set as default cluster
    kuberoku clusters:remove NAME          Unregister a cluster
        --confirm NAME                     Skip confirmation
    kuberoku clusters:switch NAME          Switch active cluster
    kuberoku clusters:current              Show current cluster
    kuberoku clusters:info NAME            Show cluster details
                                           (K8s version, node count, apps count)
    kuberoku clusters:doctor               Check cluster health
                                           (API reachable, RBAC, storage, CNI)
    kuberoku clusters:setup                Check + auto-fix cluster requirements
                                           (creates namespace, prints fix hints)

    Config file: ~/.kuberoku/config.yaml
    Stores: cluster name → kubeconfig context mapping

    Check registry (shared by doctor + setup):
    - api_reachable        (connectivity)  Can reach K8s API
    - namespace_access     (connectivity)  Namespace exists/can be created (fixable)
    - configmap_crud       (rbac)          Can CRUD ConfigMaps
    - secret_crud          (rbac)          Can CRUD Secrets
    - deployment_crud      (rbac)          Can CRUD Deployments
    - service_crud         (rbac)          Can CRUD Services
    - network_policy       (rbac)          Can CRUD NetworkPolicies
    - storage_class        (storage)       Default StorageClass exists
    - cni_enforcement      (infra)         NetworkPolicy-capable CNI detected
    - ingress_controller   (infra)         Ingress controller detected (nginx/traefik/etc.)
    - ingress_crud         (rbac)          Can CRUD Ingress resources (skipped if no controller)
    - cert_manager         (infra)         cert-manager installed (warning if absent, not blocking)
    - lb_support           (infra)         LoadBalancer support available (cloud/ServiceLB/MetalLB)

    Cluster resolution order (for all commands):
    1. .kuberoku project file cluster field
    2. KUBEROKU_CLUSTER env var
    3. ~/.kuberoku/config.yaml current_cluster
    4. Current kubeconfig context (backward compat)


4.14  PLUGINS
─────────────
Heroku equivalent: heroku plugins:*
Mechanism: Python entry_points + PyPI convention

    kuberoku plugins                       List installed plugins
    kuberoku plugins:install NAME          Install a plugin (pip install wrapper)
        Example: kuberoku plugins:install kuberoku-postgres
    kuberoku plugins:uninstall NAME        Uninstall a plugin
    kuberoku plugins:search QUERY          Search PyPI for kuberoku-* packages


================================================================================
 5. RBAC & PERMISSIONS — WHAT HAPPENS WITH LIMITED CLUSTER ACCESS
================================================================================

Kuberoku talks directly to the K8s API. If the kubeconfig user/service-account
lacks permissions for a specific K8s operation, that Kuberoku command WILL FAIL
with a clear error explaining what permission is missing.

MINIMUM PERMISSIONS (RBAC)
──────────────────────────
Kuberoku requires these K8s API permissions at minimum:

    Resource        Verbs Needed              Used By
    ──────────────  ────────────────────────  ─────────────────────────
    namespaces      get, list                 apps (discovery)
    configmaps      get, list, create,        apps, config, releases
                    update, patch, delete
    deployments     get, list, create,        deploy, ps, maintenance
                    update, patch, delete
    pods            get, list, delete         ps, logs, run
    pods/log        get                       logs
    pods/exec       create                    run
    services        get, list, create,        deploy (web/service types),
                    update, delete            addons
    jobs            get, list, create,        run
                    delete

ELEVATED PERMISSIONS (OPTIONAL FEATURES)
────────────────────────────────────────
These are only needed if you use the corresponding features:

    Resource        Verbs Needed              Used By
    ──────────────  ────────────────────────  ─────────────────────────
    namespaces      create, delete            Only if namespace_mode: per-app (V2)
                                              NOT needed for V1 default
    secrets         get, list, create,        config:set --secret
                    update, patch, delete
    ingresses       get, list, create,        domains
                    update, delete
    statefulsets    get, list, create,        addons (stateful: postgres, redis)
                    update, patch, delete
    persistent      get, list, create,        addons (StatefulSet storage)
    volumeclaims    delete                    Namespace-scoped. Kuberoku creates
                                              PVCs, NOT PVs. PVs are cluster-scoped
                                              and created by the storage provisioner.
    horizontalpod   get, list, create,        ps:autoscale (future)
    autoscalers     update, delete

PERMISSION ERROR BEHAVIOR
─────────────────────────
When a K8s API call returns 403 Forbidden, Kuberoku will:

1. NOT silently fail or swallow the error
2. Show a clear, actionable error message:

    $ kuberoku apps:create myapp
    Error: Permission denied — cannot create namespaces.

    Your kubeconfig user "dev-user" lacks the "create" verb on "namespaces".

    Ask your cluster admin to grant this permission, or ask them to
    pre-create the namespace for you and use:
        kuberoku apps:create myapp --namespace existing-ns

    Required ClusterRole rule:
        apiGroups: [""]
        resources: ["namespaces"]
        verbs: ["create"]

3. Where possible, suggest workarounds (e.g., use existing namespace)
4. Exit with a non-zero exit code

PERMISSION PREFLIGHT CHECK
──────────────────────────
    kuberoku clusters:doctor                Check cluster connectivity & permissions
        Runs the shared check registry (see Section 4.13).
        Each check reports pass/fail with fix hints.

    kuberoku clusters:setup                 Check + auto-fix cluster requirements
        Runs all checks, auto-fixes what's possible (e.g., creates namespace),
        then re-runs to verify. Prints manual fix instructions for the rest.

    $ kuberoku clusters:doctor
    [+]  1. Kubernetes API is reachable (PASS)
    [+]  2. Namespace 'kuberoku' is accessible (PASS)
    [+]  3. ConfigMap create/get/delete works (PASS)
    [+]  4. Secret create/get/delete works (PASS)
    [+]  5. Deployment create/get/delete works (PASS)
    [+]  6. Service create/get/delete works (PASS)
    [!]  7. NetworkPolicy CRUD failed (FAIL)
          Fix: Grant RBAC permissions for networkpolicies
    [+]  8. Default StorageClass exists (PASS)
    [+]  9. NetworkPolicy CNI detected (calico-node) (PASS)
    [+] 10. Ingress controller detected (PASS)
    [+] 11. Ingress create/get/delete works (PASS)
    [+] 12. cert-manager installed for auto-TLS (PASS)
    [+] 13. LoadBalancer support available (PASS)

    12/13 checks passed, 1 failed.

    $ kuberoku clusters:setup
    (same output + auto-fix + re-check + manual fix hints)

OPERATING WITH REDUCED PERMISSIONS
───────────────────────────────────
Kuberoku degrades gracefully:

    Missing Permission          What Still Works              What Breaks
    ────────────────────────    ────────────────────────      ──────────────────
    No namespace create         Everything (V1 default)       namespace_mode: per-app (V2)
    No secrets                  config:set (ConfigMap only)   config:set --secret
    No ingresses                Everything except domains     domains:add/remove
    No jobs                     Everything except run         run (one-off commands)
    No pods/exec                Everything except run (tty)   run bash (interactive)
    No services                 Worker deploys                expose, deploy (with ports)
    No statefulsets/PVCs        Everything except addons      addons:create (stateful)

Kuberoku NEVER requires cluster-admin. A narrowly-scoped Role or ClusterRole
is sufficient for all core features.

EXAMPLE: MINIMAL RBAC MANIFEST
───────────────────────────────
    # Role (namespace-scoped) — sufficient for V1 single-namespace mode
    apiVersion: rbac.authorization.k8s.io/v1
    kind: Role
    metadata:
      name: kuberoku-user
      namespace: kuberoku              # the namespace Kuberoku uses
    rules:
    - apiGroups: [""]
      resources: [configmaps, pods, pods/log, services]
      verbs: [get, list, create, update, patch, delete]
    - apiGroups: [""]
      resources: [pods/exec]
      verbs: [create]
    - apiGroups: ["apps"]
      resources: [deployments]
      verbs: [get, list, create, update, patch, delete]
    - apiGroups: ["batch"]
      resources: [jobs]
      verbs: [get, list, create, delete]
    ---
    apiVersion: rbac.authorization.k8s.io/v1
    kind: ClusterRole
    metadata:
      name: kuberoku-admin          # Superset — adds ns + ingress + secret + addons
    rules:
    - apiGroups: [""]
      resources: [namespaces]
      verbs: [get, list, create]        # create for auto-namespace on first run
    - apiGroups: [""]
      resources: [configmaps, pods, pods/log, services, secrets,
                  persistentvolumeclaims]
      verbs: [get, list, create, update, patch, delete]
    - apiGroups: [""]
      resources: [pods/exec]
      verbs: [create]
    - apiGroups: ["apps"]
      resources: [deployments, statefulsets]
      verbs: [get, list, create, update, patch, delete]
    - apiGroups: ["batch"]
      resources: [jobs]
      verbs: [get, list, create, delete]
    - apiGroups: ["networking.k8s.io"]
      resources: [ingresses]
      verbs: [get, list, create, update, delete]


================================================================================
 6. COLON COMMAND IMPLEMENTATION
================================================================================

APPROACH: Nested Groups + resolve_command() Override
────────────────────────────────────────────────────
Both syntaxes work identically:
    kuberoku apps:create myapp
    kuberoku apps create myapp

This is NOT naive string splitting. It uses Click's `resolve_command()` —
the intended extension point for custom command resolution.

IMPLEMENTATION
──────────────
    import click

    class ColonCommandGroup(click.Group):
        """Click Group that supports colon-separated subcommands.

        Allows both 'apps:create' and 'apps create' syntax.
        """

        def resolve_command(self, ctx, args):
            # If the first arg contains a colon, split it into group + command
            cmd_name = args[0] if args else None
            if cmd_name and ":" in cmd_name:
                parts = cmd_name.split(":", 1)
                args = [parts[0], parts[1]] + list(args[1:])
            return super().resolve_command(ctx, args)

WHY THIS APPROACH
─────────────────
- resolve_command() IS Click's extension point for this exact purpose
- Tab completion works because groups are real Click groups
- Help text works at all levels (--help on root, group, and command)
- Groups compose naturally (apps, config, ps are all click.Group instances)
- Single 6-line method override, no monkey-patching
- No dependency on string-splitting hacks that break edge cases

HELP TEXT BEHAVIOR
──────────────────
    $ kuberoku --help
    Usage: kuberoku [OPTIONS] COMMAND [ARGS]...

    Commands:
      apps          Manage apps
      config        Manage config vars
      deploy        Deploy an image
      ps            Manage dynos
      logs          Show app logs
      ...

    $ kuberoku apps --help
    Usage: kuberoku apps [OPTIONS] COMMAND [ARGS]...

    Commands:
      create        Create a new app
      destroy       Destroy an app
      info          Show app info
      rename        Rename an app

    $ kuberoku apps:create --help
    Usage: kuberoku apps create [OPTIONS] NAME
    ...


================================================================================
 7. SDK API DESIGN (IMPORTABLE IN PYTHON)
================================================================================

The SDK IS the product. The CLI is just one interface on top of it.
Every CLI command is a thin wrapper over an SDK method.

PRIMARY API — FACADE
─────────────────────
    from kuberoku import Kuberoku

    k = Kuberoku()                              # uses current kubeconfig context
    k = Kuberoku(context="production")          # specific kubeconfig context
    k = Kuberoku(kubeconfig="/path/to/config")  # explicit kubeconfig path

    # Apps
    app = k.apps.create("myapp")                # → App
    apps = k.apps.list()                        # → list[App]
    info = k.apps.info("myapp")                 # → App (with dynos, releases)
    k.apps.destroy("myapp")                     # → None
    k.apps.rename("old", "new")                 # → App

    # Config
    config = k.config.list("myapp")             # → dict[str, str]
    k.config.set("myapp", DATABASE_URL="postgres://...", SECRET_KEY="hunter2")
    k.config.unset("myapp", "OLD_VAR")          # → None
    val = k.config.get("myapp", "DATABASE_URL") # → str

    # Deploy (build from git — default)
    release = k.deploy("myapp")                 # → Release (builds from HEAD)
    release = k.deploy("myapp", ref="v1.2.0")   # → Release (builds from tag)
    release = k.deploy("myapp", ref="abc1234")   # → Release (builds from commit)

    # Deploy (pre-built image — for CI/CD)
    release = k.deploy(                         # → Release
        "myapp",
        image="registry.io/myapp:v3",           # skips build when image is given
        process_type="web",                     # web | worker | service
        ports=["8080/tcp"],
        replicas=2,
    )

    # Process management
    dynos = k.ps.list("myapp")                  # → list[Dyno]
    k.ps.scale("myapp", web=3, worker=2)        # → dict[str, int]
    k.ps.restart("myapp")                       # → None
    k.ps.restart("myapp", type="web")           # → None
    k.ps.set_command("myapp",                   # → dict[str, str | None]
        web="gunicorn app:app",
        worker="celery -A myapp worker",
    )
    cmds = k.ps.commands("myapp")               # → dict[str, str | None]

    # Logs (via services — logs are per-process, so they live here)
    logs = k.services.get_logs("myapp")                  # → list[LogLine]
    logs = k.services.get_logs("myapp", process_type="web", num=100)  # filtered
    for line in k.services.stream_logs("myapp"):         # → Iterator[LogLine]
        print(line)

    # Run (via services — exec targets a running process pod)
    result = k.services.exec_interactive("myapp", ["python", "manage.py", "migrate"])  # → ExecResult
    result = k.services.exec_detached("myapp", ["bash"], process_type="web")           # → ExecResult

    # Releases
    releases = k.releases.list("myapp")         # → list[Release]
    info = k.releases.info("myapp", version=3)  # → Release
    k.releases.rollback("myapp", version=1)     # → Release
    k.releases.prune("myapp", keep=50)          # → int (num deleted)

    # Addons
    addon = k.addons.create("myapp", "postgres")                    # → Addon (instance="postgres")
    addon = k.addons.create("myapp", "postgres", as_name="analytics-db")  # → Addon (instance="analytics-db")
    addons = k.addons.list("myapp")                                # → list[Addon]
    k.addons.destroy("myapp", "analytics-db")                      # → None (by instance name)
    k.addons.backup("myapp", "analytics-db")                       # → bytes (dump data)
    k.addons.migrate("myapp", "postgres", target_version="16")     # → Addon (by instance name)
    k.addons.info("myapp", "analytics-db")                         # → Addon

    # Services (expose:on/off, open, connect — process types)
    k.services.expose_on("myapp", "smtp")                       # → ExposedEndpoint (process)
    k.services.expose_on("myapp", "smtp", method="nodeport")    # → ExposedEndpoint
    k.services.expose_off("myapp", "smtp")                     # → ExposedEndpoint (back to ClusterIP)
    k.services.resolve_connect("myapp", "web")               # → ConnectionTarget
    url = k.services.open_url("myapp")                       # → str (browser URL)

    # Addons networking (expose:on/off, connect)
    k.addons.expose_on("myapp", "analytics")                    # → ExposedEndpoint (addon)
    k.addons.expose_off("myapp", "analytics")                  # → ExposedEndpoint
    k.addons.connect("myapp", "postgres")                    # → ConnectionTarget

    # Ports (add/remove without redeploying)
    k.services.list_ports("myapp")                           # → dict[str, tuple[PortMapping]]
    k.services.add_port("myapp", "993/tcp")                  # → rolling restart
    k.services.remove_port("myapp", "2525/tcp")              # → rolling restart

    # Domains (HTTP exposure via Ingress)
    domain = k.domains.add("myapp", "myapp.com")        # → Domain
    domains = k.domains.list("myapp")                   # → list[Domain]
    k.domains.remove("myapp", "myapp.com")              # → None

    # Maintenance (app-wide + per-service)
    k.apps.maintenance_on("myapp")                      # → None (all processes)
    k.apps.maintenance_off("myapp")                     # → None (all processes)
    k.services.maintenance_on("myapp", "smtp")          # → None (one process)
    k.services.maintenance_off("myapp", "smtp")         # → None (one process)

    # Clusters
    k.clusters.add("prod", context="prod-eks")          # → ClusterInfo
    clusters = k.clusters.list()                        # → list[ClusterInfo]
    k.clusters.switch("prod")                           # → None
    k.clusters.remove("prod")                           # → None

    # Status (single-glance overview)
    overview = k.apps.status("myapp")                   # → AppStatus (release, dynos, domains, etc.)

    # Doctor
    report = k.doctor()                                 # → dict[str, PermissionStatus]

GRANULAR IMPORTS
────────────────
    from kuberoku.sdk.apps import AppsService
    from kuberoku.sdk.config import ConfigService
    from kuberoku.sdk.deploy import DeployService
    from kuberoku.sdk.build import BuildService
    from kuberoku.sdk.ps import PsService
    from kuberoku.sdk.releases import ReleasesService
    from kuberoku.sdk.addons import AddonsService
    from kuberoku.sdk.services import ServicesService  # includes logs + exec
    from kuberoku.sdk.domains import DomainsService
    # Maintenance methods live on AppsService and ServicesService (Phase 9.7)
    from kuberoku.sdk.clusters import ClustersService

    # Each service takes a K8sClientProtocol instance
    from kuberoku.k8s.client import K8sClient
    client = K8sClient(context="production")
    apps = AppsService(client)
    apps.create("myapp")

BUILD vs DEPLOY SEPARATION
──────────────────────────
    DeployService and BuildService are separate. This is critical:

    BuildService (sdk/build.py):
        - Depends on: git, docker (subprocess calls)
        - Handles: git archive, docker build, docker push, registry detection
        - NOT imported unless build-from-git is used

    DeployService (sdk/deploy.py):
        - Depends on: K8sClientProtocol only
        - Handles: Deployment create/update, release creation, rollout
        - BuildService is an optional dependency, injected when needed

    Why this matters:
        - `kuberoku deploy --image` → only DeployService → no git/docker needed
        - `kuberoku deploy` → BuildService + DeployService → needs git + docker
        - SDK users doing `k.deploy("app", image="...")` never import git/docker
        - CI environments can install kuberoku without Docker and use --image
        - Keeps the core SDK lightweight — build logic is opt-in

    Composition:
        # --image path (CI/CD)
        deploy_service = DeployService(client)
        deploy_service.deploy("myapp", image="reg/myapp:v3")

        # build-from-git path (developer)
        build_service = BuildService()  # no K8s client needed
        image = build_service.build("myapp", ref="HEAD")  # → "reg/myapp:abc1234"
        deploy_service.deploy("myapp", image=image)

        # The Kuberoku facade (k.deploy) composes both automatically

CLI-TO-SDK MAPPING
──────────────────
Every CLI command is a 3-5 line function:

    @apps_group.command()
    @click.argument("name")
    @click.pass_context
    def create(ctx, name):
        """Create a new app."""
        k = ctx.obj["kuberoku"]
        app = k.apps.create(name)
        echo(f"Created {app.name}")

No business logic in CLI. Ever. The CLI does:
1. Parse args/flags
2. Call SDK method
3. Format output (table/json)
4. Handle errors (translate exceptions to user messages)


================================================================================
 8. NON-HTTP SERVICES & MULTI-PORT (HEROKU CAN'T DO THIS)
================================================================================

PROCESS TYPES
─────────────
    Process type names are ARBITRARY — the user chooses them (Procfile keys
    or --type flag). Only "web" and "worker" have special defaults:

    Type        Has ports?      K8s Resources Created       Special behavior
    ─────────   ──────────────  ─────────────────────────   ──────────────────
    web         Yes (8080)      Deployment + Service        Default port 8080,
                                (ClusterIP)                 auto-domain if
                                                            base_domain set
    worker      No              Deployment only             No Service created
    (any name)  Yes (specify)   Deployment + Service        Must specify --port
                                (ClusterIP)

    "web" is just a convention — it gets a default port (8080) and auto-domain.
    Any other name with --port works identically.

    Exposure is SEPARATE from process type (see Section 4.9 and 4.10):
    - domains:add      → works for any process type (Ingress, HTTP routing)
    - services:expose:on  → works for any process type (LoadBalancer/NodePort)

MULTI-PORT SYNTAX
─────────────────
    kuberoku deploy --app mailserver \
        --image postfix:latest \
        --port 25/tcp \
        --port 587/tcp \
        --port 993/tcp

    kuberoku deploy --app dns \
        --image coredns:latest \
        --port 53/tcp \
        --port 53/udp

    kuberoku deploy --app gameserver \
        --image game:v1 \
        --port 7777/udp
    # Deploy creates ClusterIP (internal). To expose publicly:
    kuberoku services:expose:on --app gameserver --method loadbalancer

PROTOCOL SUPPORT
────────────────
    Protocol    Flag            K8s Service Port Protocol
    ────────    ────            ────────────────────────
    TCP         --port N/tcp    protocol: TCP
    UDP         --port N/udp    protocol: UDP

PORT MAPPING RULES (container → service)
────────────────────────────────────────
    K8s requires named ports when multiple ports are defined. Kuberoku
    auto-generates port names from the protocol and port number.

    --port flag     containerPort    Service port name    Service port
    ────────────    ─────────────    ─────────────────    ────────────
    8080/tcp        8080             tcp-8080             8080
    25/tcp          25               tcp-25               25
    587/tcp         587              tcp-587              587
    53/udp          53               udp-53               53

    Rules:
    - containerPort always matches the Service port (same number)
    - Port name format: {protocol}-{port} (e.g., tcp-8080, udp-53)
    - For web processes: default containerPort=8080, Service port=8080
    - Container must LISTEN on these ports (Kuberoku doesn't remap)
    - If no --port flag: web defaults to 8080/tcp, service types MUST specify

DEFAULT BEHAVIOR — ALL SERVICES START INTERNAL
───────────────────────────────────────────────
    Process Type    Default Port    Default Service Type    External Exposure
    ────────────    ────────────    ────────────────────    ──────────────────────
    web             8080/tcp        ClusterIP               Via domains:add (HTTP)
    worker          (none)          (no Service)            Never
    (any name)      (must specify)  ClusterIP               Via domains:add (HTTP)
                                                            or expose (TCP/UDP)

    Deploy NEVER creates external exposure. It's always a separate step.
    The exposure method is NOT tied to the process type — any process type
    with ports can use EITHER method:

    - domains:add          → Ingress rule (HTTP/gRPC/WebSocket). --type to target.
    - services:expose:on      → LoadBalancer/NodePort. Process types.
    - addons:expose:on        → LoadBalancer/NodePort. Addons.
    - addons:connect       → Temporary port-forward. Addons (debugging).
    - services:ports:add    → Add a port without redeploying (rolling restart).
    - services:ports:remove → Remove a port without redeploying (rolling restart).

    These are INDEPENDENT and COMPOSABLE:
        kuberoku domains:add api.myapp.com --type web      # HTTP via Ingress
        kuberoku services:expose:on grpc                      # TCP via LoadBalancer
        kuberoku addons:expose:on analytics                   # addon via LoadBalancer

    Both are reversible without restarting the application.
    See Section 4.9 for full networking commands.


8.1  NETWORKING MODEL — HOW EVERYTHING COEXISTS ON ONE CLUSTER
──────────────────────────────────────────────────────────────
A cluster may run dozens of apps (some HTTP, some TCP/UDP), each with
addons (postgres, redis, etc.). They NEVER fight for ports or IPs.
Here's how every layer works:

    INSIDE THE CLUSTER — NO CONFLICTS POSSIBLE
    ────────────────────────────────────────────
    Every K8s Service gets its own ClusterIP (virtual IP). This means:
    - App1's postgres on port 5432 and App2's postgres on port 5432
      are on DIFFERENT ClusterIPs. No conflict.
    - App1's web on port 8080 and App2's web on port 8080 are on
      DIFFERENT ClusterIPs. No conflict.
    - An addon's redis on port 6379 and another addon's redis on 6379
      are on DIFFERENT ClusterIPs. No conflict.

    Cluster-internal networking is unlimited. Ports only conflict when
    you try to expose things OUTSIDE the cluster on the same IP.

    LAYER 1: HTTP APPS (web process type) — HOSTNAME ROUTING
    ─────────────────────────────────────────────────────────
    All HTTP apps share ONE external IP: the Ingress controller's
    LoadBalancer. Routing is by hostname (HTTP Host header):

        Ingress Controller (1 LoadBalancer, 1 IP, port 443)
        ├── myapi.example.com    → kuberoku-myapi-web:8080
        ├── frontend.example.com → kuberoku-frontend-web:8080
        └── admin.example.com    → kuberoku-admin-web:8080

    One IP, one port (443/80), unlimited apps via hostnames.
    No port conflicts. This is standard web hosting architecture.

    LAYER 2: NON-HTTP (TCP/UDP) — GATEWAY SERVICE (shared LB, allocated ports)
    ────────────────────────────────────────────────────────────────────────────
    Non-HTTP services cannot use hostname routing (raw TCP has no Host
    header). The Gateway Service provides a shared LoadBalancer with
    Kuberoku-allocated ports (10000-10999), avoiding per-service LB costs.

    Default: Gateway (random allocated port)
        `services:expose:on smtp` or `addons:expose:on redis`
        Kuberoku allocates a unique port (e.g., 10000) on the shared gateway LB.
        Result: gateway_ip:10000 → smtp:25 (internal).
        No port conflicts — allocation tracked in ConfigMap with optimistic
        concurrency + re-read collision guard.

    With explicit ports (services only):
        `services:expose:on smtp --ports 25,587`
        Prompts: use gateway (random ports) or dedicated LoadBalancer (fixed ports)?
        `--method loadbalancer` forces dedicated LB without prompt.

    Dedicated LoadBalancer (escape hatch):
        Each service gets its OWN cloud LoadBalancer → its own IP.
        Mailserver on 34.1.2.3:25, Game server on 34.4.5.6:7777.
        No conflicts. Cost: ~$15-20/mo per LoadBalancer.

    Gateway storage: ConfigMap {PREFIX}-gateway in namespace:
        allocations: {"10000": {"app": "myapp", "service": "svc-name", "target_port": 6379}}
        next_port: "10001"

    Bottom line: non-HTTP services don't fight for ports because the
    gateway allocates unique ports, or dedicated LBs give separate IPs.

    LAYER 3: ADDONS — INTERNAL BY DEFAULT, EXPOSABLE VIA GATEWAY
    ─────────────────────────────────────────────────────────────
    Addons (postgres, redis) default to ClusterIP (internal).
    They can be exposed externally via the gateway service.

    - Apps reach addons via internal DNS:
        postgres://kuberoku-addon-myapi-maindb:5432/app
    - Humans reach addons via temporary port-forward:
        kuberoku addons:connect maindb  (Ctrl+C to close)
    - External access (e.g., for external tools):
        kuberoku addons:expose:on maindb  → gateway_ip:10001

    You can run 50 postgres instances and 30 redis instances on one
    cluster. They all use port 5432 / 6379 internally. Zero conflicts
    because each has its own ClusterIP.

    LAYER 4: AUTO-GENERATED DOMAINS (the Heroku feel)
    ──────────────────────────────────────────────────
    If a BASE_DOMAIN is configured at the cluster level, every web app
    automatically gets a generated domain on first deploy:

        App "myapi" → myapi.apps.mycluster.com (auto-generated)

    This is like Heroku's myapp.herokuapp.com — instant URL, zero config.
    The user can ALSO add custom domains on top:

        kuberoku domains:add api.mycompany.com   (custom, in addition)

    Both work simultaneously. The generated domain is never removed unless
    the app is destroyed.

    See Section 4.10 for BASE_DOMAIN configuration details.

    SUMMARY TABLE — WHAT CONSUMES WHAT
    ────────────────────────────────────
    Resource            External IP?    External Port?    Conflicts?
    ──────────────────  ──────────────  ────────────────  ──────────────
    HTTP app (web)      Shared (1)      443/80 (shared)   No (hostnames)
    TCP/UDP (LB)        Own IP ($$$)    Original port     No (separate IPs)
    TCP/UDP (NodePort)  Node IPs        Auto 30000-32767  No (K8s assigns)
    Addon               None            None              No (internal only)
    Worker              None            None              No (no network)

    THE GOLDEN RULE: NOTHING IS AUTO-EXPOSED
    ─────────────────────────────────────────
    Everything starts internal (ClusterIP). The user explicitly chooses
    what to expose and how. Exposure is NOT tied to process type — it's
    an independent decision:

    - domains:add         → Ingress rule (HTTP routing). Any process type.
    - services:expose:on     → LoadBalancer/NodePort (direct IP). Process types.
    - services:open       → Port-forward + open browser. HTTP process types.
    - addons:expose:on       → LoadBalancer/NodePort (direct IP). Addons.
    - addons:connect      → Temporary port-forward. Addons (debugging).
    - Workers             → No network at all.

    One app can use BOTH methods, even on the same service:
        kuberoku domains:add api.myapp.com --type web        # HTTP via Ingress
        kuberoku services:expose:on grpc                        # TCP via LoadBalancer
        kuberoku addons:expose:on analytics                     # addon via LoadBalancer

    Because exposure is always opt-in and explicit, there are no
    accidental port conflicts, no surprise cloud bills, no security
    surprises.


8.2  NETWORK ISOLATION — APP-LEVEL SECURITY VIA NETWORKPOLICY
─────────────────────────────────────────────────────────────
All apps share a single namespace (Section 9). Without network isolation,
any pod can reach any other pod on any port. A compromised web process in
App A could scan and connect to App B's postgres addon on port 5432.
NetworkPolicies prevent this.

    DESIGN: DEFAULT-DENY PER APP + EXPLICIT ALLOW PER COMPONENT
    ────────────────────────────────────────────────────────────
    Every app gets a blanket deny-all-ingress policy at creation time.
    Each component (process, addon) then adds a scoped allow rule for
    its declared ports from same-app pods only. External access is
    layered on top via services:expose:on/addons:expose:on/domains:add.

    THE FOUR POLICY TYPES
    ─────────────────────

    1. APP DENY POLICY — created on apps:create
       Selects ALL pods belonging to the app. Denies all ingress.
       No traffic from any source — not other apps, not same app,
       not external. Everything starts locked down.

           apiVersion: networking.k8s.io/v1
           kind: NetworkPolicy
           metadata:
             name: {PREFIX}-netpol-deny-{app}
             labels:
               {DOMAIN}/managed-by: {PREFIX}
               {DOMAIN}/app: {app}
               {DOMAIN}/resource-type: network-policy-deny
           spec:
             podSelector:
               matchLabels:
                 {DOMAIN}/app: {app}
             policyTypes: [Ingress]
             ingress: []                    # Empty = deny ALL ingress

    2. COMPONENT ALLOW POLICY — created on deploy / addons:create
       Allows ingress from same-app pods on declared ports only.
       NetworkPolicies are additive, so this punches a hole in the
       deny-all for specific traffic.

           # For a web process on port 8080:
           metadata:
             name: {PREFIX}-netpol-allow-{app}-{process}
             labels:
               {DOMAIN}/managed-by: {PREFIX}
               {DOMAIN}/app: {app}
               {DOMAIN}/resource-type: network-policy-allow
               {DOMAIN}/process-type: {process}
           spec:
             podSelector:
               matchLabels:
                 {DOMAIN}/app: {app}
                 {DOMAIN}/process-type: {process}
             policyTypes: [Ingress]
             ingress:
               - from:
                   - podSelector:
                       matchLabels:
                         {DOMAIN}/app: {app}      # Same-app pods ONLY
                 ports:
                   - port: 8080
                     protocol: TCP

           # For a postgres addon on port 5432:
           metadata:
             name: {PREFIX}-netpol-allow-{app}-{instance}
             labels:
               {DOMAIN}/managed-by: {PREFIX}
               {DOMAIN}/app: {app}
               {DOMAIN}/resource-type: network-policy-allow
               {DOMAIN}/addon-instance: {instance}
           spec:
             podSelector:
               matchLabels:
                 {DOMAIN}/app: {app}
                 {DOMAIN}/addon-instance: {instance}
             policyTypes: [Ingress]
             ingress:
               - from:
                   - podSelector:
                       matchLabels:
                         {DOMAIN}/app: {app}
                 ports:
                   - port: 5432
                     protocol: TCP

    3. EXTERNAL ALLOW POLICY — created on services:expose:on / addons:expose:on / domains:add
       Allows ingress from ALL sources (external clients, ingress
       controllers, other pods) on declared ports. No `from` field
       means "allow from everywhere."

           metadata:
             name: {PREFIX}-netpol-external-{app}-{process}
             labels:
               {DOMAIN}/managed-by: {PREFIX}
               {DOMAIN}/app: {app}
               {DOMAIN}/resource-type: network-policy-external
               {DOMAIN}/process-type: {process}
             annotations:
               {DOMAIN}/exposed-by: "expose"    # or "domains" or "expose,domains"
           spec:
             podSelector:
               matchLabels:
                 {DOMAIN}/app: {app}
                 {DOMAIN}/process-type: {process}
             policyTypes: [Ingress]
             ingress:
               - ports:                    # No `from` = allow from ALL
                   - port: 8080
                     protocol: TCP

       Both services:expose:on and domains:add create/share the same external
       policy (same name, idempotent). On services:expose:off, check if
       domains:add is still active before deleting. On domains:remove,
       check if services:expose:on is still active. Delete only when neither
       is active. Track via annotation: {DOMAIN}/exposed-by: expose,domains

    4. NO EGRESS RESTRICTION
       Only policyTypes: [Ingress] is set. Egress is unrestricted —
       pods can freely reach external APIs, DNS, other cluster
       services. Egress restriction would break DNS resolution and
       require explicit allow rules for kube-dns. Not worth the
       complexity for V1. Can be added later if needed.

    LIFECYCLE: WHAT HAPPENS ON EACH COMMAND
    ────────────────────────────────────────
    Command         Action                  Policy
    ──────────────  ──────────────────────  ──────────────────────────────
    apps:create     Create deny-all         netpol-deny-{app}
    deploy          Create process allow    netpol-allow-{app}-{process}
    addons:create   Create addon allow      netpol-allow-{app}-{instance}
    addons:destroy  Delete addon allow      netpol-allow-{app}-{instance}
    services:expose:on   Create external allow netpol-external-{app}-{process}
    services:expose:off Delete external allow (if no domains:add active)
    addons:expose:on     Create external allow netpol-external-{app}-{instance}
    addons:expose:off         Delete external allow
    domains:add             Create external allow netpol-external-{app}-{process}
    domains:remove          Delete external allow (if no services:expose:on active)
    services:ports:add       Update process allow    Add port to existing policy
    services:ports:remove    Update process allow    Remove port from policy
    apps:destroy            Delete ALL policies     Label-based: {DOMAIN}/app={app}
    apps:rename     Update labels           Same as other resources

    SECURITY PROPERTIES ACHIEVED
    ────────────────────────────
    1. App A cannot reach App B internally — deny-all blocks non-same-app
    2. Addon only reachable on declared port — postgres accepts 5432 only
    3. Process only reachable on declared port — web accepts 8080 only
    4. expose opens public access on declared ports only — not all ports
    5. unexpose re-isolates — back to same-app only
    6. Workers (no ports) are fully isolated — deny-all, no allow rule
    7. Other apps CAN reach exposed endpoints — by design. Once exposed,
       it's public to everyone. An internal app connecting through the
       public endpoint is a different thing.

    WHY NOT NAMESPACE-PER-APP?
    ──────────────────────────
    We use a single shared namespace (Section 9). Namespace-per-app would
    give free isolation but adds complexity: cross-namespace Service DNS,
    RBAC per namespace, resource quota management. The label-based
    NetworkPolicy approach gives the same isolation with the simplicity
    of a single namespace.

    CNI REQUIREMENT — NETWORKPOLICY ENFORCEMENT
    ────────────────────────────────────────────
    NetworkPolicies are a standard K8s API resource. The API server
    ALWAYS accepts them. But enforcement requires a CNI plugin that
    implements the NetworkPolicy spec. Without one, policies exist
    but have no effect.

    CNI is a cluster-level component chosen by the cluster operator,
    NOT by applications running on the cluster. Kuberoku is "zero
    server-side components" — we don't install or configure CNIs.
    We create standard NetworkPolicy resources and they work
    identically on any CNI that supports them.

    Environment         Default CNI      Enforces NetworkPolicy?
    ──────────────────  ───────────────  ─────────────────────────
    EKS                 Amazon VPC CNI   Yes (with Calico addon)
    GKE                 Dataplane V2     Yes (Cilium-based)
    AKS                 Azure CNI        Yes (with policy option)
    k3s / k3d           Flannel          No (swap to Calico/Cilium)
    kind                kindnet          No (add Calico addon)
    Colima              Flannel          No
    Bare metal          Varies           Depends on install

    The two main options cluster operators have:
    - Calico: most widely deployed, works on every K8s distribution
    - Cilium: eBPF-based, GKE default, excellent performance

    OUR APPROACH:
    1. ALWAYS create NetworkPolicies — zero runtime cost if not enforced,
       critical isolation when enforced. The K8s API accepts them regardless.
    2. doctor check — warn if CNI doesn't enforce NetworkPolicies.
       Check for known CNI DaemonSets (calico-node, cilium) in kube-system.
    3. Document clearly — "For network isolation in production, ensure
       your cluster uses a CNI that enforces NetworkPolicies."

    Security by default. NetworkPolicies cost nothing if the CNI doesn't
    enforce them, and provide critical isolation when it does. Always-on,
    not opt-in.

    IMPLEMENTATION FILES
    ────────────────────
    - K8sClientProtocol: create/get/update/delete/list_network_policies
    - k8s/resources.py: build_app_deny_policy(), build_process_allow_policy(),
      build_addon_allow_policy(), build_external_allow_policy()
    - k8s/labels.py: resource-type values: network-policy-deny,
      network-policy-allow, network-policy-external
    - sdk/apps.py: create/destroy/rename manage deny policies
    - sdk/services.py: services:expose:on/off manage external policies (process types)
    - sdk/addons.py: addons:create creates addon allow policies,
      addons:expose:on/off manage external policies (addons)
    - sdk/deploy.py: deploy creates process allow policies
    - FakeK8sClient: NetworkPolicy CRUD (same dict-based pattern)
    - tests/contract/test_network_policy_crud.py: contract tests
    - tests/networking/test_network_policy.py: per-command policy tests


================================================================================
 9. ARCHITECTURE
================================================================================

HIGH-LEVEL FLOW
───────────────
    User → CLI (Click, thin) → SDK (business logic) → K8sClientProtocol
                                                            │
                                                    ┌───────┴───────┐
                                                    │               │
                                               K8sClient       FakeK8sClient
                                             (real cluster)      (tests)

ARCHITECTURE PRINCIPLES
───────────────────────
Distilled from analyzing 12+ production CLI tools: Heroku CLI (oclif),
GitHub CLI (gh), kubectl, AWS CLI, Docker CLI, boto3/botocore, opencode,
Stripe CLI, Poetry, Terraform CLI.

PRINCIPLE 1: THREE LAYERS, STRICT DEPENDENCY DIRECTION

    CLI → SDK → K8s (infrastructure)

    Dependencies flow DOWN only. Never:
    - SDK imports from CLI (no output formatting in business logic)
    - K8s imports from SDK (no business logic in the K8s client)
    - CLI imports from K8s (always goes through SDK)

    What lives where:

    Layer        Responsibility                       Examples
    ───────────  ───────────────────────────────────  ────────────────────────────
    CLI          Arg parsing, output formatting,      Click groups, Rich tables,
                 error presentation, user I/O         progress bars, prompts
    SDK          Business logic, state management,    AppsService.create(),
                 validation, orchestration            DeployService.deploy()
    K8s/Infra    I/O operations, external calls       K8sClient, Docker, git

    The SDK IS the product. The CLI is one consumer. Other consumers:
    Python scripts, Jupyter notebooks, a web UI, a Slack bot.

    Enforcement: code review. No linter needed for 13 services. If it grows
    to 50, consider import-linter (Python package for dependency checking).

PRINCIPLE 2: NO GLOBAL STATE

    Everything flows through the factory (see below). No module-level
    singletons, no global config object, no shared mutable state.

    Why: testability. Every test creates its own factory with its own
    FakeK8sClient. No test pollution. No ordering dependencies. No
    unittest.mock.patch() hell.

PRINCIPLE 3: SDK NEVER PRINTS

    SDK methods return data (dataclasses, lists, strings). They NEVER:
    - Print to stdout/stderr
    - Use Rich, click.echo, or any output library
    - Show progress bars or spinners

    Progress reporting uses callbacks:

        def deploy(self, app: str, image: str,
                   on_step: Callable[[str], None] | None = None) -> Release:
            if on_step: on_step("Creating release...")
            release = self._create_release(app, image)
            if on_step: on_step("Updating deployments...")
            self._update_deployments(app, release)
            return release

    The CLI passes a callback that shows Rich progress. Tests pass None.
    The SDK stays pure — importable, testable, composable.

PRINCIPLE 4: ERRORS ARE TYPED, CAUGHT AT ONE BOUNDARY

    SDK raises typed exceptions (AppNotFoundError, DeployError, etc.).
    The CLI has ONE error handler at the top level that catches
    KuberokuError and formats it for the user. Individual commands do
    NOT have try/except for domain errors — that's the boundary's job.

        # cli/main.py — the single error boundary
        def main():
            try:
                cli()
            except KuberokuError as e:
                output.error(e)
                sys.exit(e.exit_code)

    Why: scattered try/except means inconsistent error formatting.
    One boundary = one format = consistent UX.

PRINCIPLE 5: LAZY EVERYTHING

    CLI startup MUST be fast (<100ms for --help). This means:
    - Don't import kubernetes until a command actually needs K8s
    - Don't connect to a cluster until a command actually needs K8s
    - kuberoku --version, --help, plugins, completion all work offline

    Implementation:
    - Click lazy group loading (load command modules on demand)
    - Factory defers K8sClient creation until first .k8s access
    - Heavy imports (kubernetes, docker, yaml) inside functions, not at
      module top level in CLI layer

    (Learned from: gh CLI connects lazily, boto3 creates clients on
    demand, kubectl only loads kubeconfig when needed)

PRINCIPLE 6: KISS / YAGNI — BUILD WHAT YOU NEED

    What we DON'T build:
    - No event system (boto3 has one because it generates 419 services
      from JSON — we have 13 hand-written services)
    - No middleware chain or pipeline pattern
    - No abstract base classes (Protocol = structural typing, already decided)
    - No "extensible" hooks for core commands
    - No generic "resource" abstraction over K8s objects
    - No ORM-style active record pattern for K8s resources

    What we DO build:
    - Protocol for K8s client (proven need: 5 test backends)
    - Factory for dependency wiring (proven need: every test)
    - Plugin system via entry_points (proven need: addon ecosystem)
    - Progress callbacks (proven need: CLI vs SDK consumers differ)

    The test: "Do we have TWO concrete uses for this abstraction right
    now?" If yes, build it. If no, just write the code directly.

    Three similar lines of code > one premature abstraction.

    (Learned from: opencode has zero plugin system — YAGNI for their
    use case. boto3 only built the Resource layer for 9 of 419 services —
    they didn't pre-build for all. Apply each pattern only where justified.)


THE FACTORY (DEPENDENCY WIRING)
───────────────────────────────
Every SDK service needs: a K8s client, the current namespace, the resource
prefix, the label domain. Instead of passing these individually to every
method, a single Factory object provides them.

This is NOT a DI framework. It's a plain class with properties.

(Learned from: gh's cmdutil.Factory, kubectl's Factory, Docker's
command.Cli — all use this exact pattern.)

    class KuberokuFactory:
        """Wires dependencies for SDK services.

        Created once per CLI invocation. Tests create their own
        with a FakeK8sClient.
        """

        def __init__(
            self,
            k8s_client: K8sClientProtocol | None = None,
            namespace: str | None = None,
            resource_prefix: str | None = None,
            label_domain: str | None = None,
        ):
            self._k8s_client = k8s_client
            self._namespace = namespace
            self._resource_prefix = resource_prefix
            self._label_domain = label_domain
            self._services: dict[str, Any] = {}  # cached service instances

        @property
        def k8s(self) -> K8sClientProtocol:
            """Lazy — only connects to K8s when first accessed."""
            if self._k8s_client is None:
                self._k8s_client = K8sClient.from_context(self.cluster_context)
            return self._k8s_client

        @property
        def namespace(self) -> str:
            if self._namespace is None:
                self._namespace = resolve_namespace()  # resolution order
            return self._namespace

        @property
        def prefix(self) -> str:
            return self._resource_prefix or branding.DEFAULT_RESOURCE_PREFIX

        @property
        def domain(self) -> str:
            return self._label_domain or branding.DEFAULT_LABEL_DOMAIN

        @property
        def apps(self) -> AppsService:
            return self._get_service("apps", AppsService)

        @property
        def config(self) -> ConfigService:
            return self._get_service("config", ConfigService)

        @property
        def deploy(self) -> DeployService:
            return self._get_service("deploy", DeployService)

        # ... one property per service

        def _get_service(self, name: str, cls: type[T]) -> T:
            if name not in self._services:
                self._services[name] = cls(self)
            return self._services[name]

    In production (CLI):
        factory = KuberokuFactory()  # lazy, resolves from config/env

    In tests:
        factory = KuberokuFactory(
            k8s_client=FakeK8sClient(),
            namespace="test-ns",
        )

    The public SDK facade (Kuberoku class from Section 7) wraps this:

        class Kuberoku:
            def __init__(self, context=None, namespace=None):
                self._factory = KuberokuFactory(...)

            @property
            def apps(self): return self._factory.apps
            @property
            def config(self): return self._factory.config

    This means the SDK public API (Section 7) and the CLI both use the
    same factory underneath. The only difference: CLI creates it from
    Click context, SDK creates it from constructor args.


SDK SERVICE PATTERN
───────────────────
Every service follows the same structure:

    class AppsService:
        """Business logic for app operations."""

        def __init__(self, factory: KuberokuFactory):
            self._factory = factory

        @property
        def _k8s(self) -> K8sClientProtocol:
            return self._factory.k8s

        @property
        def _ns(self) -> str:
            return self._factory.namespace

        @property
        def _prefix(self) -> str:
            return self._factory.prefix

        def create(self, name: str,
                   on_step: Callable[[str], None] | None = None) -> App:
            validate_app_name(name)
            if self._exists(name):
                raise AppAlreadyExistsError(name)
            if on_step: on_step("Creating app manifest...")
            self._k8s.create_configmap(
                self._ns, build_app_manifest(self._prefix, name)
            )
            if on_step: on_step("Creating env config...")
            self._k8s.create_configmap(
                self._ns, build_env_configmap(self._prefix, name)
            )
            if on_step: on_step("Creating formation...")
            self._k8s.create_configmap(
                self._ns, build_formation_configmap(self._prefix, name)
            )
            return App(name=name, ...)

        def list(self) -> list[App]:
            cms = self._k8s.list_configmaps(self._ns, labels={
                f"{self._factory.domain}/managed-by": self._factory.prefix,
                f"{self._factory.domain}/resource-type": "app-manifest",
            })
            return [App.from_configmap(cm) for cm in cms]

    Rules for every SDK service:
    - Constructor takes factory ONLY (not individual dependencies)
    - Convenience properties for _k8s, _ns, _prefix (avoid self._factory.k8s
      everywhere — readability matters)
    - Return dataclasses, never raw dicts
    - Raise typed exceptions (AppNotFoundError, etc.)
    - Never sys.exit(), never click.echo(), never import Rich
    - Progress via on_step callback, not print
    - No decorators, no metaclasses, no magic. Plain methods.

    WHY NOT A BASE CLASS?
    It's tempting to write:

        class BaseService:
            def __init__(self, factory): ...
            @property
            def _k8s(self): ...

    Don't. 13 services, each with 3 trivial properties, is 39 lines of
    "duplication." A base class saves those 39 lines but adds inheritance,
    makes the code harder to grep, and couples all services to one ancestor.
    Copy-paste wins here. (If you have 50+ services, revisit.)


CLI COMMAND PATTERN
───────────────────
Every CLI command follows the same structure:

    # cli/apps.py
    import click
    from .main import ColonCommandGroup

    @click.group(cls=ColonCommandGroup)
    def apps():
        """Manage apps."""
        pass

    @apps.command()
    @click.argument("name")
    @click.pass_context
    def create(ctx, name):
        """Create a new app."""
        factory = ctx.obj["factory"]
        with output.steps() as step:
            app = factory.apps.create(name, on_step=step)
        output.success(f"App {app.name} created.")

    @apps.command("list")
    @click.option("--output", "fmt", type=click.Choice(["table", "json"]),
                  default="table")
    @click.pass_context
    def list_apps(ctx, fmt):
        """List all apps."""
        factory = ctx.obj["factory"]
        apps = factory.apps.list()
        output.render(apps, fmt, columns=["name", "created_at"])

    Rules for every CLI command:
    1. Get factory from ctx.obj["factory"]
    2. Call ONE SDK method
    3. Format output via the output module (cli/output.py)
    4. NO business logic in CLI layer — no ifs over K8s resources, no
       loops assembling data, no "if this label then that"
    5. NO try/except for domain errors (boundary handler does this)
    6. Progress callbacks via output.steps() passed to SDK methods

    The main CLI entry point creates the factory and error boundary:

        # cli/main.py
        @click.group(cls=ColonCommandGroup)
        @click.pass_context
        def cli(ctx):
            ctx.ensure_object(dict)
            ctx.obj["factory"] = KuberokuFactory()

        def main():
            try:
                cli()
            except KuberokuError as e:
                output.error(e)
                sys.exit(e.exit_code)

    OUTPUT FORMAT: Every list/info command supports --output json|table
    (default: table). This is critical for scripting and CI/CD:

        kuberoku apps --output json | jq '.[].name'
        kuberoku config --app myapi --output json

    The output module handles both. SDK returns data, output formats it.
    JSON mode = json.dumps(dataclasses.asdict(...)). Table mode = Rich table.


THINGS YOU MIGHT NOT HAVE THOUGHT OF YET
─────────────────────────────────────────
(Lessons from tools that learned the hard way)

(a) SIGNAL HANDLING (Ctrl+C DURING DEPLOY)

    If the user hits Ctrl+C during a multi-step deploy, Kuberoku must NOT
    leave partial state silently. Two options:

        Option A (simpler): Let the current K8s API call finish, then abort.
        Print what completed, what didn't. Idempotent re-run handles the rest.

        Option B (complex): Trap SIGINT, revert completed steps, then exit.

    V1: Option A. The idempotency rules (Rule 1 above) already handle this.
    A half-finished deploy is fixed by re-running deploy. Don't add revert-
    on-signal complexity for something idempotency already solves.

    What to print on Ctrl+C:
        ^C
        Interrupted after: Creating release v7... done, Updating web... done
        Not completed: Updating worker
        State may be inconsistent. Re-run to finish: kuberoku deploy --image myapi:v3

(b) SHELL COMPLETION

    Click auto-generates shell completions for commands and flags. For
    dynamic arguments (app names, addon names), provide custom completers:

        def complete_app_name(ctx, param, incomplete):
            factory = ctx.obj.get("factory")
            if not factory: return []
            try:
                return [a.name for a in factory.apps.list()
                        if a.name.startswith(incomplete)]
            except Exception:
                return []  # silent fail — completion is best-effort

    Install completions: kuberoku completion bash|zsh|fish
    (Phase 12, distribution)

(c) PERFORMANCE & SNAPPINESS — THE #1 UX DIFFERENTIATOR

    A CLI that feels slow is a CLI people avoid. Every millisecond matters.
    Heroku CLI is notoriously slow (~2-3s for simple commands). We beat that.

    STARTUP TIME BUDGET:
    ────────────────────
        kuberoku --help          <100ms (zero I/O, zero K8s)
        kuberoku --version       <50ms
        kuberoku apps            <300ms (K8s API = ~100-150ms network)
        kuberoku deploy          <500ms + build time (build is user-visible work)
        Tab completion            <200ms (best-effort, silent fail OK)

    THE IMPORT PROBLEM:
    ───────────────────
        `import kubernetes`    ~150-200ms  (biggest offender)
        `import rich`          ~50ms
        `import click`         ~30ms
        `import yaml`          ~20ms
        ─────────────────────────────
        Naive top-level import: ~350ms BEFORE your code even runs

    THE FIX: LAZY IMPORTS IN CLI LAYER
    ───────────────────────────────────
        # BAD — cli/apps.py top-level
        from kuberoku.sdk.apps import AppsService  # triggers kubernetes import

        # GOOD — cli/apps.py deferred
        @apps.command()
        @click.pass_context
        def create(ctx, name):
            factory = ctx.obj["factory"]  # factory is lazy
            app = factory.apps.create(name)  # K8s imported HERE, first time

    Rules:
    - CLI modules: NEVER import sdk/ or k8s/ at module level
    - SDK modules: CAN import k8s/ at module level (SDK is only loaded
      when a command runs, so the import cost is paid after startup)
    - Click groups: use lazy loading (click-plugins or manual LazyGroup)
      so kuberoku --help doesn't import ALL command modules
    - The factory defers K8s client creation until first .k8s access

    LAZY GROUP IMPLEMENTATION:
    ──────────────────────────
        class LazyGroup(click.Group):
            """Load command modules only when invoked."""

            def __init__(self, *args, lazy_subcommands=None, **kwargs):
                super().__init__(*args, **kwargs)
                self._lazy = lazy_subcommands or {}

            def list_commands(self, ctx):
                base = super().list_commands(ctx)
                return sorted(base + list(self._lazy.keys()))

            def get_command(self, ctx, cmd_name):
                if cmd_name in self._lazy:
                    module_path, attr = self._lazy[cmd_name].rsplit(":", 1)
                    module = importlib.import_module(module_path)
                    return getattr(module, attr)
                return super().get_command(ctx, cmd_name)

        # cli/main.py
        cli = ColonCommandGroup(
            lazy_subcommands={
                "apps":        "kuberoku.cli.apps:apps",
                "config":      "kuberoku.cli.config:config",
                "deploy":      "kuberoku.cli.deploy:deploy",
                # ... all groups loaded on demand
            }
        )

    K8S API CALL DISCIPLINE:
    ────────────────────────
    Each K8s API call costs ~50-150ms over the network. A command that makes
    5 sequential calls = 500ms of pure wait time. Rules:

    1. MINIMIZE CALL COUNT
       BAD:  get_configmap("app-myapi") + get_configmap("env-myapi")
             + get_configmap("formation-myapi")  = 3 calls
       GOOD: list_configmaps(labels={"app": "myapi"})  = 1 call
             Then filter client-side.

    2. NEVER MAKE A CALL YOU DON'T NEED
       `kuberoku apps` needs app names + created_at.
       It does NOT need pod status, replica counts, or release info.
       One list_configmaps() with resource-type=app-manifest. Done.

    3. PARALLEL CALLS FOR INDEPENDENT DATA
       `kuberoku apps:status` needs: app manifest + deployments + latest release.
       These are independent reads. Use concurrent.futures:

           with ThreadPoolExecutor(max_workers=3) as pool:
               f_app = pool.submit(k8s.get_configmap, ns, app_cm)
               f_deps = pool.submit(k8s.list_deployments, ns, labels)
               f_rel = pool.submit(k8s.get_configmap, ns, release_cm)
               app_data, deps, release = f_app.result(), f_deps.result(), f_rel.result()

       3 calls in ~150ms instead of ~450ms.

    4. WATCH, DON'T POLL
       For `logs --follow` and deploy rollout watching, use the K8s watch API
       (server-sent events over HTTP) not polling. Polling = N calls/second.
       Watch = 1 long-lived connection, server pushes updates.

           # K8sClientProtocol addition for watch:
           def watch_pods(self, namespace, labels, timeout) -> Iterator[WatchEvent]

    CONNECTION REUSE:
    ─────────────────
    The kubernetes Python client uses urllib3 connection pooling internally.
    Create ONE K8sClient per CLI invocation (the factory does this). Never
    create-destroy-recreate. The HTTP/2 connection stays warm across all
    K8s calls within a single command.

    PLUGIN LOADING:
    ───────────────
    importlib.metadata.entry_points() scans all installed packages (~30-50ms).
    Don't load plugins eagerly at startup. Load them:
    - At CLI init: just discover NAMES (entry point names, not code)
    - On first access: load the actual plugin code

        def load_plugins(root_group):
            eps = importlib.metadata.entry_points(group="kuberoku.plugins")
            for ep in eps:
                # Register as lazy — name only, code loaded on invoke
                root_group._lazy[ep.name] = _LazyPlugin(ep)

    This means `kuberoku --help` shows plugin names without importing
    plugin code. The plugin module is only imported when you run it.

    CI ENFORCEMENT:
    ───────────────
    Add a benchmark test that fails if startup regresses:

        # tests/cli/test_startup.py
        def test_startup_time():
            start = time.monotonic()
            result = subprocess.run(["kuberoku", "--help"],
                                    capture_output=True, timeout=5)
            elapsed = time.monotonic() - start
            assert elapsed < 0.15, f"Startup too slow: {elapsed:.3f}s"

    Run in CI on every PR. If it breaks, someone added a top-level import.

(d) CONFIG FILE VERSIONING

    ~/.kuberoku/config.yaml should have a version field:

        version: 1
        current_cluster: production
        clusters:
          production:
            context: prod-eks

    When the format changes, bump version and add migration. Never silently
    break people's configs. Reading version 1 config with version 2 code
    should auto-migrate (or warn with instructions).

(e) PLUGIN SANDBOXING (WHAT PLUGINS CAN'T DO)

    Plugins CAN:
    - Add new CLI command groups (postgres:backup, redis:snapshot, etc.)
    - Access the factory (and thus K8s client, namespace, etc.)
    - Ship their own SDK services for plugin-specific logic

    Plugins CANNOT:
    - Modify core commands (no "hook into deploy")
    - Intercept K8s calls (no middleware on K8sClient)
    - Replace or wrap existing SDK services
    - Monkey-patch anything

    This is intentional. Plugins EXTEND Kuberoku; they don't MODIFY it.

    (Learned from: boto3's event system is powerful but debugging plugin
    interactions across 419 services is a nightmare. gh's extensions are
    isolated processes that can't break the core. Isolation > power.)

(f) DRY RUN (FUTURE — NOT V1)

    Don't build it now, but the architecture supports it: SDK services
    build K8s resource dicts (via k8s/resources.py) before submitting them
    to K8sClient. A --dry-run flag could intercept the dicts and print
    them without calling K8s. The separation of resources.py (construction)
    and client.py (submission) makes this trivial later.

(g) TTY DETECTION — SMART OUTPUT

    When piped or redirected, Kuberoku MUST behave differently:

        # Interactive terminal
        $ kuberoku apps
        ┌──────────┬─────────────┐   ← Rich table, colors, spinners
        │ Name     │ Created     │
        ├──────────┼─────────────┤
        │ myapi    │ 2 hours ago │
        └──────────┴─────────────┘

        # Piped to another command
        $ kuberoku apps | grep myapi
        myapi    2025-02-09T14:30:00Z  ← Plain text, no colors, no box drawing

        # Redirected to file
        $ kuberoku apps > list.txt
        (same plain text)

    Rules:
    - if sys.stdout.isatty(): use Rich (tables, colors, spinners, progress)
    - else: use plain text (tab-separated, ISO timestamps, no ANSI codes)
    - --output json always produces JSON regardless of TTY
    - --no-color forces plain output even in TTY
    - Spinners/progress bars: only in TTY mode. In pipe mode, print each
      step as a line ("Creating release... done")
    - KUBEROKU_NO_COLOR=1 env var (respects NO_COLOR convention:
      https://no-color.org/)

    Implementation: the output module (cli/output.py) checks once at init
    and switches between RichRenderer and PlainRenderer. SDK doesn't care.

(h) DEBUG MODE — WHEN THINGS GO WRONG

    Users WILL hit K8s permission issues, network timeouts, wrong contexts.
    They need visibility into what Kuberoku is actually doing.

        $ KUBEROKU_DEBUG=1 kuberoku deploy --image myapi:v3
        [DEBUG] Resolved cluster: production (context: prod-eks)
        [DEBUG] Namespace: kuberoku
        [DEBUG] K8s API: GET /api/v1/namespaces/kuberoku/configmaps/kuberoku-app-myapi (142ms)
        [DEBUG] K8s API: POST /api/v1/namespaces/kuberoku/configmaps (87ms)
          Creating release v7... done
        [DEBUG] K8s API: PATCH /apis/apps/v1/namespaces/kuberoku/deployments/kuberoku-myapi-web (93ms)
          Updating web... done
        Release v7 live.

    Also: kuberoku --verbose (same effect, flag form)

    What debug mode shows:
    - Resolved cluster context, namespace, prefix
    - Every K8s API call: verb, resource, response time
    - Config resolution: which file/env/flag provided each value
    - Plugin loading: which plugins found, load time

    What debug mode NEVER shows:
    - Secret values (always masked as ***)
    - Full request/response bodies (too verbose, use kubectl --v=8 for that)

    Implementation: logging module. SDK services use logging.getLogger(__name__).
    Debug mode sets level to DEBUG. Normal mode sets level to WARNING.
    No print() calls for debug — always logger.debug().

(i) SECRET MASKING — NEVER LEAK CREDENTIALS

    Kuberoku handles Secrets (K8s Secret resources). Secret values MUST
    NEVER appear in:
    - Normal command output (config:get shows keys, not values by default)
    - Debug logs (mask as ***)
    - Error messages ("Failed to update Secret kuberoku-secret-myapi"
      NOT "Failed to update Secret with DATABASE_URL=postgres://...")
    - Crash reports / tracebacks

    config:get shows values only with explicit flag:
        $ kuberoku config --app myapi
        DATABASE_URL    [set]
        SECRET_KEY      [set]
        APP_NAME        myapp            ← non-secret ConfigMap values shown

        $ kuberoku config --app myapi --reveal
        DATABASE_URL    postgres://...    ← explicit opt-in
        SECRET_KEY      abc123

    How we know what's a secret: anything stored in K8s Secret (not ConfigMap).
    Kuberoku stores sensitive vars (containing URL, KEY, SECRET, PASSWORD,
    TOKEN in the name, or explicitly marked) in Secrets, rest in ConfigMaps.

(j) CRASH REPORTING — UNHANDLED EXCEPTIONS

    When Kuberoku hits an unexpected error (bug, not user error):

        $ kuberoku deploy --image myapi:v3
        Internal error: unexpected NoneType in release counter.

        kuberoku 0.3.1 | Python 3.12.1 | K8s context: prod-eks
        Please report: https://github.com/amanjain/kuberoku/issues

        Debug info: KUBEROKU_DEBUG=1 kuberoku deploy --image myapi:v3

    Rules:
    - NEVER show raw Python tracebacks to users (unless KUBEROKU_DEBUG=1)
    - Show: tool version, Python version, K8s context (not full kubeconfig)
    - Show: exact command to re-run with debug mode
    - Show: link to issue tracker
    - Exit code 2 (infrastructure error, not user error)

(k) UPGRADE NOTIFICATION — NON-BLOCKING, CACHED

    After command output, check for newer version (if not checked recently):

        $ kuberoku apps
        Name     Created
        myapi    2 hours ago

        Update available: 0.3.1 → 0.4.0
        Run: pip install --upgrade kuberoku

    Rules:
    - Check PyPI API (GET https://pypi.org/pypi/kuberoku/json) ONCE
    - Cache result in ~/.kuberoku/.update-check (timestamp + latest version)
    - Re-check at most once per 24 hours
    - NEVER block the main command — check AFTER output, or in background thread
    - NEVER show in pipe mode (not a TTY)
    - Disable with KUBEROKU_NO_UPDATE_CHECK=1

    Implementation: after cli() completes, spawn a thread that checks PyPI.
    If newer version found, print the notification. If the thread is still
    running when main exits, abandon it (daemon thread).

(l) TOKEN EXPIRATION — EKS, GKE, AKS

    Cloud K8s clusters use short-lived tokens:
    - EKS: 15-minute tokens from `aws eks get-token`
    - GKE: 1-hour tokens from `gcloud auth print-access-token`
    - AKS: tokens from `kubelogin`

    The kubernetes Python client handles token refresh IF the kubeconfig
    has an `exec` command configured (which cloud CLIs set up). Kuberoku
    does NOT need to handle token refresh itself — the K8s client library
    does it. But Kuberoku MUST:

    - Use the K8s client's built-in config loading (load_kube_config())
      which respects exec-based auth providers
    - NOT cache or store tokens separately
    - Handle 401 Unauthorized gracefully:

        Error: Authentication failed for cluster "production".
        Your token may have expired. Try:
          aws eks update-kubeconfig --name my-cluster   (EKS)
          gcloud container clusters get-credentials ...  (GKE)

(m) REGISTRY AUTHENTICATION

    `kuberoku deploy` (build mode) pushes images to a registry. Kuberoku
    does NOT implement its own registry auth. It uses Docker's credential
    helpers (docker login, credential stores, ~/.docker/config.json).

    If push fails with 401/403:

        Error: Registry push failed — authentication denied.
        Run: docker login {registry_url}
        Then retry: kuberoku deploy

    Kuberoku just runs `docker push`. Docker handles auth. This means
    ECR, GCR, ACR, Docker Hub, GitHub Container Registry all work if
    Docker is configured.

(n) MULTI-ARCH IMAGES — ARM vs x86

    `kuberoku deploy` (build mode) builds for the LOCAL architecture.
    If you build on an Apple Silicon Mac (ARM) and your cluster runs
    AMD64 nodes, the image won't work.

    V1: Document this clearly. Suggest workarounds:

        Warning: Built image for linux/arm64 but cluster nodes are linux/amd64.
        The deployment may fail. Options:
          1. Build in CI on matching architecture
          2. Use: kuberoku deploy --platform linux/amd64 (uses docker buildx)
          3. Use: kuberoku deploy --image <pre-built-multi-arch-image>

    V1 ships --platform flag that passes through to `docker buildx build`.
    Detect mismatch: compare local arch (platform.machine()) with node
    labels (kubernetes.io/arch) from the cluster. Warn if different.

(o) GIT SUBMODULES & LARGE REPOS

    `git archive` does NOT include submodules. If the project uses
    submodules, the build will be missing code. Document clearly:

        Error: Build failed — missing files.
        If your project uses git submodules, `kuberoku deploy` (git mode)
        does not include submodule contents. Use:
          kuberoku deploy --image <pre-built-image>

    Large repos: `git archive` produces a clean tarball of committed files
    only (no .git directory, no gitignored files), so the Docker build
    context is always minimal. This is already better than `docker build .`
    which sends the entire directory.

(p) DETERMINISTIC OUTPUT — FOR SCRIPTING

    All list commands produce deterministic output:
    - Sort by name (alphabetical) by default
    - --sort flag for alternatives: --sort created, --sort name
    - No random ordering, no "last accessed" ordering
    - JSON output uses sorted keys (json.dumps(sort_keys=True))

    This matters for: diff-based testing, scripted pipelines that compare
    output, CI jobs that assert on command output.

(q) RETRY WITH BACKOFF — K8s API RESILIENCE

    K8s API can return:
    - 429 Too Many Requests (rate limited)
    - 503 Service Unavailable (API server overloaded)
    - Network timeout (connection dropped)

    Strategy: exponential backoff with jitter for transient errors.

        MAX_RETRIES = 3
        BASE_DELAY = 0.5  # seconds

        for attempt in range(MAX_RETRIES):
            try:
                return k8s_api_call()
            except (RateLimitError, ServiceUnavailable, Timeout) as e:
                if attempt == MAX_RETRIES - 1:
                    raise
                delay = BASE_DELAY * (2 ** attempt) + random.uniform(0, 0.5)
                time.sleep(delay)

    Only retry TRANSIENT errors. Never retry:
    - 400 Bad Request (our bug)
    - 403 Forbidden (RBAC)
    - 404 Not Found (resource doesn't exist)
    - 409 Conflict (handled separately by optimistic concurrency)
    - 422 Unprocessable (invalid resource spec)

    This lives in K8sClient, not in SDK services. Services don't know
    about retries — they just call k8s.create_deployment() and it works
    or raises after exhausting retries.

(r) CROSS-PLATFORM: WINDOWS + LINUX + macOS

    Kuberoku is an INCONVENIENCE to support on Windows, not impossible.
    Python is cross-platform. Click is cross-platform. The kubernetes
    Python client is cross-platform. Docker Desktop works on Windows.
    The hard parts are edge cases, not fundamentals.

    WHAT'S EASY (works out of the box):
    - Click CLI parsing and argument handling → cross-platform
    - kubernetes Python client → cross-platform (kubeconfig, API calls)
    - Rich terminal output → cross-platform (detects Windows terminal)
    - JSON/YAML parsing → cross-platform
    - All SDK business logic → pure Python, no OS dependency
    - Docker commands → docker CLI works on Windows/macOS/Linux

    WHAT NEEDS CARE (code with if/else per platform):

    1. File paths
       - Use pathlib.Path EVERYWHERE, never hardcode "/" separators
       - Config dir: pathlib.Path.home() / ".kuberoku" (works on all OS)
       - Project file walk: Path.resolve().parents (works on all OS)

    2. Shell completion
       - bash/zsh/fish → Linux/macOS only
       - PowerShell → Windows only
       - Click supports PowerShell completion natively since Click 8.0
       - Detect shell: SHELL env var on Unix, COMSPEC on Windows

    3. Docker socket
       - Linux: /var/run/docker.sock (default)
       - macOS: same (Docker Desktop creates this)
       - Windows: npipe:////./pipe/docker_engine (Docker Desktop)
       - We don't need to handle this — we call `docker build` and
         `docker push` via subprocess, Docker CLI handles socket paths

    4. Signal handling
       - SIGINT (Ctrl+C) → works on ALL platforms (Python catches it)
       - SIGTERM → Linux/macOS only (Windows doesn't have it)
       - We only use SIGINT, so no issue

    5. TTY detection
       - sys.stdout.isatty() → works on ALL platforms
       - ANSI codes → Windows Terminal and modern PowerShell support them
       - Old cmd.exe → Rich auto-detects and falls back to plain output

    6. Subprocess calls (git, docker)
       - git → works on all platforms (Git for Windows)
       - docker → works on all platforms (Docker Desktop)
       - Use subprocess.run() with list args (not shell=True)
       - NEVER: subprocess.run("git archive | docker build", shell=True)
       - ALWAYS: subprocess.run(["git", "archive", ...])

    7. Line endings
       - Git handles this via .gitattributes (core.autocrlf)
       - YAML/JSON parsers handle both \n and \r\n
       - No action needed

    WHAT WE DO FOR V1:
    - Write cross-platform code from day 1 (pathlib, no shell=True, etc.)
    - CI: ubuntu-latest (macOS/Windows added later)
    - Two backends: fake (in-memory) and real (any K8s cluster)
    - k3d for fast CI integration, kind for weekly compat tests
    - --backend flag selects fake or real per pytest run

    CI MATRIX:
        matrix:
          python: ["3.11", "3.12", "3.13"]
        steps:
          - pytest tests/ -v --backend fake
          - pytest tests/ -v --backend real   # requires running cluster

    WHAT'S GENUINELY HARD ON WINDOWS (V2 concerns):
    - `kuberoku addons:connect` (port-forward) → may need special handling for
      Windows firewall rules
    - `kuberoku services:exec` (exec into pod) → TTY allocation differs on Windows
      (use winpty or Windows pseudo-console)
    - PyInstaller binaries → separate build per OS (already planned)
    - Shell completion install → PowerShell profile path differs

    BOTTOM LINE: Not hard, just needs discipline. Use pathlib, don't use
    shell=True, test on all three OS in CI, and handle the 7 items above.
    Python makes this much easier than Go or Rust would.

(s) GRACEFUL SHUTDOWN IN DEPLOYMENTS

    When Kuberoku creates Deployments, it should set reasonable defaults:

        terminationGracePeriodSeconds: 30     (K8s default, keep it)
        strategy:
          type: RollingUpdate
          rollingUpdate:
            maxUnavailable: 25%               (K8s default)
            maxSurge: 25%                     (K8s default)

    Kuberoku does NOT customize these in V1. Use K8s defaults. Users who
    need custom values can use kubectl to patch. Future: `ps:configure`
    command for advanced Deployment settings.

    HEALTH CHECKS:
    Kuberoku does NOT add readiness/liveness probes in V1. Reason:
    - We don't know what protocol the app speaks (HTTP? TCP? gRPC?)
    - Wrong probes cause more harm than no probes (restart loops)
    - K8s works fine without probes — pods are "Ready" when container starts

    Future (V2): `kuberoku health:set --app myapi --type web --path /health`
    to configure HTTP readiness probes. Only for users who want it.

(t) WHAT HAPPENS WHEN KUBEROKU IS UNINSTALLED

    Nothing. Apps keep running. K8s doesn't care that the CLI that created
    the Deployments is gone. The resources are standard K8s objects.

    The user can still manage them with kubectl:
        kubectl get deployments -n kuberoku -l app.kuberoku.com/managed-by=kuberoku
        kubectl delete deployment kuberoku-myapi-web -n kuberoku

    If someone reinstalls Kuberoku later, it rediscovers all existing
    apps by scanning for ConfigMaps with the managed-by label. Zero
    migration needed.

(u) WHY LAYER-BASED, NOT FEATURE-BASED DIRECTORY STRUCTURE

    Alternative considered: organize by feature (apps/ has both cli and sdk,
    config/ has both cli and sdk, etc.). This is common in Go (gh, kubectl).

    Why we chose layer-based (cli/, sdk/, k8s/):
    - Python convention — import paths like `from kuberoku.sdk.apps import
      AppsService` are clear and standard
    - Layer enforcement is visible in the directory structure — you can see
      at a glance that sdk/ doesn't import from cli/
    - cli/ and sdk/ mirror each other 1:1 (apps.py in both) which makes
      navigation easy — want the SDK for apps? sdk/apps.py. Want the CLI?
      cli/apps.py.
    - Feature-based works in Go because packages are compilation units.
      In Python, there's no benefit — just deeper import paths.

    (gh uses cmd/ + internal/ which is layer-based too. kubectl uses
    pkg/ + cmd/ which is also layer-based. Layer-based is the norm.)


CANONICAL SOURCE TREE
─────────────────────
This is the SINGLE SOURCE OF TRUTH for the project structure. All other
references (phase file lists, test examples) must match this exactly.
If there's a conflict, THIS section wins.

    kuberoku/                              # Repository root
    ├── pyproject.toml                     # Build config (hatchling), deps, entry points
    ├── src/
    │   └── kuberoku/                      # The Python package
    │       │
    │       │  # ── ROOT MODULES ─────────────────────────────────
    │       │  # Importable as: from kuberoku import Kuberoku
    │       │
    │       ├── __init__.py                # Exports: Kuberoku, __version__
    │       ├── __main__.py                # `python -m kuberoku` support
    │       ├── _version.py                # __version__ = "0.1.0"
    │       ├── branding.py                # TOOL_NAME + all derived constants (Section 2)
    │       ├── factory.py                 # KuberokuFactory — dependency wiring
    │       ├── client.py                  # Kuberoku — PUBLIC SDK FACADE
    │       │                              #   This is the top-level class users import:
    │       │                              #   `from kuberoku import Kuberoku`
    │       │                              #   `k = Kuberoku(context="prod")`
    │       │                              #   It wraps KuberokuFactory. NOT in sdk/
    │       │                              #   because it IS the public entry point.
    │       ├── models.py                  # All frozen dataclasses (App, Release, etc.)
    │       ├── exceptions.py              # KuberokuError hierarchy (Section 11)
    │       │
    │       │  # ── K8S LAYER ────────────────────────────────────
    │       │  # Infrastructure: talks to Kubernetes. Pure I/O.
    │       │  # Importable as: from kuberoku.k8s.client import K8sClient
    │       │
    │       ├── k8s/
    │       │   ├── __init__.py
    │       │   ├── protocols.py           # K8sClientProtocol (typing.Protocol)
    │       │   ├── client.py              # K8sClient — real K8s (wraps kubernetes lib)
    │       │   │                          #   Handles: retry, resourceVersion, dict
    │       │   │                          #   normalization, connection pooling
    │       │   ├── labels.py              # Label/annotation builders: for_app(),
    │       │   │                          #   for_process(), for_addon(), selector()
    │       │   └── resources.py           # K8s resource dict builders: build_deployment(),
    │       │                              #   build_app_manifest(), safe_name(), etc.
    │       │
    │       │  # ── SDK LAYER ────────────────────────────────────
    │       │  # Business logic: orchestrates K8s operations.
    │       │  # Each service = one command group's logic.
    │       │  # Importable as: from kuberoku.sdk.apps import AppsService
    │       │
    │       ├── sdk/
    │       │   ├── __init__.py
    │       │   ├── apps.py                # AppsService (create, destroy, info, list, rename)
    │       │   ├── config.py              # ConfigService (set, get, unset)
    │       │   ├── deploy.py              # DeployService (orchestrates build + release + rollout)
    │       │   ├── build.py               # BuildService (git archive + docker build + push)
    │       │   ├── registry.py            # RegistryService (auto-detect, resolve registry URL)
    │       │   ├── releases.py            # ReleasesService (list, info, rollback, prune)
    │       │   ├── ps.py                  # PsService (scale, restart, set, commands, list)
    │       │   ├── services.py             # ServicesService (expose_on, expose_off, open_url,
    │       │   │                          #   list_ports, add_port, remove_port,
    │       │   │                          #   get_logs, stream_logs, exec_interactive,
    │       │   │                          #   exec_detached)
    │       │   ├── addons.py              # AddonsService (create, destroy, info, list, backup,
    │       │   │                          #   expose_on, expose_off, connect)
    │       │   ├── domains.py             # DomainsService (add, remove, list)
    │       │   ├── clusters.py            # ClustersService (add, switch, list, config)
    │       │   ├── checks.py              # Check registry (CLUSTER_REQUIREMENTS)
    │       │   └── doctor.py              # DoctorService (RBAC checks, consistency audit)
    │       │
    │       │  # ── CLI LAYER ────────────────────────────────────
    │       │  # User interface: Click commands, Rich output.
    │       │  # Mirrors sdk/ 1:1. Each file = one command group.
    │       │  # Importable as: from kuberoku.cli.main import cli
    │       │
    │       ├── cli/
    │       │   ├── __init__.py
    │       │   ├── main.py                # Root group, ColonCommandGroup, LazyGroup,
    │       │   │                          #   error boundary, factory creation
    │       │   ├── apps.py                # apps:create, apps:destroy, apps:info, etc.
    │       │   ├── config.py              # config, config:set, config:get, config:unset
    │       │   ├── deploy.py              # deploy command (--image and git modes)
    │       │   ├── releases.py            # releases, releases:info, releases:rollback
    │       │   ├── ps.py                  # ps, ps:scale, ps:restart, ps:set, ps:commands
    │       │   ├── services.py             # services:expose:on/off, services:open, services:ports:*,
    │       │   │                          #   services:logs (--tail, --follow, --type),
    │       │   │                          #   services:exec (interactive + --detach)
    │       │   ├── addons.py              # addons:create/destroy/info/expose:on/off/connect/etc.
    │       │   ├── domains.py             # domains:add, domains:remove, domains
    │       │   ├── clusters.py            # clusters:add, clusters:switch, clusters, etc.
    │       │   │                          #   + clusters:doctor, clusters:setup
    │       │   ├── plugins.py             # plugins, plugins:install, plugins:uninstall
    │       │   │                          # (apps:status, apps:maintenance:on/off are in apps.py)
    │       │   │                          # (apps:link:add, apps:link:remove are in apps.py)
    │       │   ├── output.py              # Rich/plain rendering, TTY detection,
    │       │   │                          #   table/JSON/--no-color, secret masking
    │       │   └── context.py             # App resolution: --app → env → .kuberoku → prompt
    │       │
    │       │  # ── CONFIG LAYER ─────────────────────────────────
    │       │  # File I/O for tool configuration (NOT app config).
    │       │  # App config = ConfigService (env vars in K8s).
    │       │  # Tool config = these files (how Kuberoku itself behaves).
    │       │
    │       ├── config/
    │       │   ├── __init__.py
    │       │   ├── user.py                # ~/.kuberoku/config.yaml — global user config
    │       │   │                          #   (clusters, current_cluster, preferences)
    │       │   └── project.py             # .kuberoku project file — per-directory link
    │       │                              #   (app name, environments, cluster override)
    │       │
    │       │  # ── ADDON DEFINITIONS ────────────────────────────
    │       │  # Each built-in addon type is a Python module exporting
    │       │  # an AddonDef dataclass. Plugin authors follow the same
    │       │  # pattern in their kuberoku-* packages.
    │       │
    │       ├── addons/
    │       │   ├── __init__.py            # BUILTIN_ADDONS registry: {"postgres": ..., ...}
    │       │   ├── _types.py              # AddonDef dataclass (the contract)
    │       │   ├── postgres.py            # PostgreSQL addon definition
    │       │   └── redis.py               # Redis addon definition
    │       │
    │       │  # ── PLUGIN SYSTEM ────────────────────────────────
    │       │
    │       └── plugins/
    │           ├── __init__.py
    │           └── loader.py              # entry_points discovery + lazy mounting
    │
    │  # ══════════════════════════════════════════════════════════════
    │  # ADDON DEFINITION CONTRACT (src/kuberoku/addons/_types.py)
    │  #
    │  # Plugin authors create the same structure. A plugin that adds
    │  # a "mongodb" addon ships:
    │  #   kuberoku_mongodb/addons/mongodb.py → exports AddonDef
    │  #   Registered via entry_points: kuberoku.addon_types = mongodb
    │  #
    │  # @dataclass(frozen=True)
    │  # class AddonDef:
    │  #     name: str                     # "postgres"
    │  #     image: str                    # "postgres:16"
    │  #     default_port: int             # 5432
    │  #     env_vars: dict[str, str]      # Multiple env vars injected into app:
    │  #                                   #   {"DATABASE_URL": "postgres://{user}:{pass}@{host}:{port}/{db}",
    │  #                                   #    "DATABASE_HOST": "{host}",
    │  #                                   #    "DATABASE_PORT": "{port}",
    │  #                                   #    "DATABASE_USER": "{user}",
    │  #                                   #    "DATABASE_PASSWORD": "{pass}"}
    │  #                                   # Templates use {host}, {port}, {user}, {pass}, {db}
    │  #                                   # With --as analytics: keys become ANALYTICS_URL, etc.
    │  #     plans: dict[str, AddonPlan]   # {"dev": ..., "standard": ...}
    │  #     default_plan: str             # "dev"
    │  #     health_cmd: list[str] | None  # ["pg_isready", "-U", "postgres"]
    │  #     volumes: dict[str, str]       # {"/var/lib/postgresql/data": "data"}
    │  #
    │  # @dataclass(frozen=True)
    │  # class AddonPlan:
    │  #     storage: str                  # "1Gi"
    │  #     cpu: str | None               # "500m" or None (no limit)
    │  #     memory: str | None            # "512Mi" or None
    │  # ══════════════════════════════════════════════════════════════
    ├── tests/
    │   ├── conftest.py                  # Backend matrix (k8s fixture), markers
    │   ├── backends/                    # Test backend implementations
    │   │   ├── fake.py                  # FakeK8sClient (in-memory)
    │   │   └── real.py                  # RealK8sClient (wraps K8sClient, normalizes errors)
    │   │
    │   │   # ── FEATURE TESTS ──────────────────────────────────
    │   │   # Each directory = one command group / feature.
    │   │   # Each file = one command (SDK + CLI tests together).
    │   │   # Add a command → add a test file. Delete a command →
    │   │   # delete the test file. No orphan tests.
    │   │
    │   ├── apps/                        # apps:* tests
    │   │   ├── conftest.py              # Shared fixtures (sample apps)
    │   │   ├── test_create.py           # apps:create (SDK + CLI)
    │   │   ├── test_destroy.py          # apps:destroy
    │   │   ├── test_info.py             # apps:info
    │   │   ├── test_list.py             # apps (list)
    │   │   ├── test_rename.py           # apps:rename
    │   │   └── test_link.py             # apps:link:add / apps:link:remove
    │   ├── config/                      # config:* tests
    │   │   ├── test_set.py              # config:set
    │   │   ├── test_get.py              # config:get (config)
    │   │   └── test_unset.py            # config:unset
    │   ├── deploy/                      # deploy + releases tests
    │   │   ├── conftest.py              # Shared (deployed app fixture)
    │   │   ├── test_image.py            # deploy --image
    │   │   ├── test_git.py              # deploy (build from git)
    │   │   ├── test_releases_list.py    # releases
    │   │   ├── test_releases_info.py    # releases:info
    │   │   └── test_rollback.py         # releases:rollback
    │   ├── ps/                          # ps:* tests
    │   │   ├── test_list.py             # ps
    │   │   ├── test_scale.py            # ps:scale
    │   │   ├── test_set.py              # ps:set (commands)
    │   │   └── test_restart.py          # ps:restart
    │   ├── services/                    # services:* tests
    │   │   ├── test_expose.py           # services:expose:on
    │   │   └── test_unexpose.py         # services:expose:off
    │   ├── ports/                       # services:ports:* tests
    │   │   ├── test_ports_list.py       # services:ports
    │   │   ├── test_ports_add.py        # services:ports:add
    │   │   └── test_ports_remove.py     # services:ports:remove
    │   ├── networking/                  # shared networking tests
    │   │   └── test_network_policy.py   # NetworkPolicy lifecycle (Section 8.2)
    │   ├── addons/                      # addons:* tests (incl. expose:on/off/connect)
    │   │   ├── conftest.py              # Shared (addon fixtures)
    │   │   ├── test_create.py           # addons:create
    │   │   ├── test_destroy.py          # addons:destroy
    │   │   ├── test_info.py             # addons:info
    │   │   └── test_list.py             # addons
    │   ├── domains/                     # domains:* tests
    │   │   ├── conftest.py              # Shared (domain fixtures)
    │   │   ├── test_add.py              # domains:add
    │   │   ├── test_remove.py           # domains:remove
    │   │   ├── test_list.py             # domains
    │   │   ├── test_clear.py            # domains:clear
    │   │   ├── test_auto_domain.py      # auto-domain assignment
    │   │   └── test_cli_domains.py      # CLI integration for domains
    │   │   # NOTE: run/exec and logs tests live in tests/services/ (test_exec.py,
    │   │   #   test_logs.py, test_logs_follow.py). No separate run/ or logs/ dirs.
    │   ├── maintenance/                 # maintenance tests (Phase 9.7)
    │   │   ├── test_on.py               # apps:maintenance:on, services:maintenance:on
    │   │   └── test_off.py              # apps:maintenance:off, services:maintenance:off
    │   ├── clusters/                    # clusters:* tests
    │   │   ├── test_add.py              # clusters:add
    │   │   └── test_list.py             # clusters
    │   │
    │   │   # ── INFRASTRUCTURE TESTS ───────────────────────────
    │   │   # Non-command tests: internal modules, helpers, models.
    │   │
    │   ├── infrastructure/              # Internal module tests
    │   │   ├── test_labels.py           # k8s/labels.py
    │   │   ├── test_resources.py        # k8s/resources.py (builders)
    │   │   ├── test_safe_name.py        # k8s/resources.py (naming)
    │   │   ├── test_branding.py         # branding.py
    │   │   ├── test_models.py           # models.py (dataclasses)
    │   │   ├── test_factory.py          # factory.py
    │   │   ├── test_context.py          # cli/context.py (app resolution)
    │   │   ├── test_output.py           # cli/output.py (TTY, JSON, table)
    │   │   ├── test_colon_commands.py   # ColonCommandGroup
    │   │   ├── test_help.py             # --help output
    │   │   ├── test_exit_codes.py       # exit code consistency
    │   │   └── test_startup.py          # startup time benchmark
    │   │
    │   │   # ── CROSS-CUTTING TESTS ────────────────────────────
    │   │
    │   ├── contract/                    # FakeK8sClient ≈ real K8s
    │   │   ├── test_fake_matches_real.py
    │   │   └── test_network_policy_crud.py  # NetworkPolicy CRUD contract
    │   ├── flows/                       # Multi-command user journeys
    │   │   ├── test_web_app_flow.py     # create → config → deploy → scale
    │   │   ├── test_smtp_flow.py        # create → deploy → expose TCP
    │   │   ├── test_multi_addon_flow.py # create → addons → config → deploy
    │   │   └── test_failure_recovery.py # partial fail → re-run → success
    │   ├── compat/                      # Backward compatibility
    │   │   └── test_v010_resources.py
    │   └── fixtures/                    # Saved K8s resource snapshots
    │       └── v0.1.0/
    └── .github/
        └── workflows/
            ├── tests.yml                # lint + fake + k3d, every push/PR
            ├── compat.yml               # kind, weekly + manual
            └── release.yml              # all checks, on tag v*

K8S ABSTRACTION VIA PROTOCOL (STRUCTURAL TYPING)
─────────────────────────────────────────────────
    from typing import Protocol, Any

    class K8sClientProtocol(Protocol):
        """Structural type for K8s operations.

        Methods return dict[str, Any] (not K8s model objects) so the fake
        client doesn't need to construct real K8s API objects.
        """

        # Namespaces
        def get_namespace(self, name: str) -> dict[str, Any]: ...
        def namespace_exists(self, name: str) -> bool: ...
        def create_namespace(self, name: str) -> dict[str, Any]: ...

        # Deployments
        def create_deployment(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_deployment(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_deployment(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_deployment(self, namespace: str, name: str) -> None: ...
        def list_deployments(self, namespace: str, labels: dict[str, str] | None = None) -> list[dict[str, Any]]: ...

        # Pods
        def list_pods(self, namespace: str, labels: dict[str, str] | None = None) -> list[dict[str, Any]]: ...
        def delete_pod(self, namespace: str, name: str) -> None: ...
        def exec_in_pod(self, namespace: str, name: str, command: list[str], tty: bool = False) -> str: ...
        def stream_pod_logs(self, namespace: str, name: str, follow: bool = False, tail_lines: int | None = None, since_seconds: int | None = None, previous: bool = False) -> Iterator[str]: ...

        # ConfigMaps
        def create_configmap(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_configmap(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_configmap(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_configmap(self, namespace: str, name: str) -> None: ...
        def list_configmaps(self, namespace: str, labels: dict[str, str] | None = None) -> list[dict[str, Any]]: ...

        # Secrets
        def create_secret(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_secret(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_secret(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_secret(self, namespace: str, name: str) -> None: ...

        # Services
        def create_service(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_service(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_service(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_service(self, namespace: str, name: str) -> None: ...
        def list_services(self, namespace: str, labels: dict[str, str] | None = None) -> list[dict[str, Any]]: ...

        # Ingress
        def create_ingress(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_ingress(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_ingress(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_ingress(self, namespace: str, name: str) -> None: ...

        # StatefulSets (for stateful addons)
        def create_statefulset(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_statefulset(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_statefulset(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_statefulset(self, namespace: str, name: str) -> None: ...
        def list_statefulsets(self, namespace: str, labels: dict[str, str] | None = None) -> list[dict[str, Any]]: ...

        # PVCs (for stateful addon storage)
        def create_pvc(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_pvc(self, namespace: str, name: str) -> dict[str, Any]: ...
        def delete_pvc(self, namespace: str, name: str) -> None: ...

        # Jobs
        def create_job(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def wait_for_job(self, namespace: str, name: str, timeout: int = 300) -> dict[str, Any]: ...
        def delete_job(self, namespace: str, name: str) -> None: ...

        # HPA
        def create_hpa(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_hpa(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_hpa(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_hpa(self, namespace: str, name: str) -> None: ...

        # NetworkPolicies (Section 8.2 — app isolation)
        def create_network_policy(self, namespace: str, body: dict) -> dict[str, Any]: ...
        def get_network_policy(self, namespace: str, name: str) -> dict[str, Any]: ...
        def update_network_policy(self, namespace: str, name: str, body: dict) -> dict[str, Any]: ...
        def delete_network_policy(self, namespace: str, name: str) -> None: ...
        def list_network_policies(self, namespace: str, labels: dict[str, str] | None = None) -> list[dict[str, Any]]: ...

        # Auth check
        def can_i(self, verb: str, resource: str, namespace: str = "") -> bool: ...

K8S CLIENT INTERNALS — WHAT LIVES INSIDE client.py
───────────────────────────────────────────────────
The real K8sClient wraps the `kubernetes` Python library. Key design:

    class K8sClient:
        """Real K8s client. Implements K8sClientProtocol."""

        def __init__(self, api_client: kubernetes.client.ApiClient):
            self._core = kubernetes.client.CoreV1Api(api_client)
            self._apps = kubernetes.client.AppsV1Api(api_client)
            self._networking = kubernetes.client.NetworkingV1Api(api_client)
            self._batch = kubernetes.client.BatchV1Api(api_client)
            self._autoscaling = kubernetes.client.AutoscalingV2Api(api_client)
            self._auth = kubernetes.client.AuthorizationV1Api(api_client)

        @classmethod
        def from_context(cls, context: str | None = None) -> "K8sClient":
            """Create client from kubeconfig context. Lazy — called once."""
            config = kubernetes.client.Configuration()
            kubernetes.config.load_kube_config(
                context=context, client_configuration=config
            )
            api_client = kubernetes.client.ApiClient(config)
            return cls(api_client)

    What client.py handles (SDK doesn't need to know):
    - Retry with exponential backoff for transient errors (429, 503, timeout)
    - resourceVersion tracking for optimistic concurrency
    - Converting K8s API model objects to plain dicts (dict normalization)
    - Connection pooling (via urllib3, automatic)
    - Token refresh (via kubeconfig exec providers, automatic)

    What client.py does NOT handle (SDK's responsibility):
    - Business logic (what to create, when, in what order)
    - Idempotency checks (does this already exist?)
    - Multi-step orchestration (create CM, then Deployment, then Service)
    - Revert on failure

RESOURCE BUILDERS — k8s/resources.py
─────────────────────────────────────
Every K8s resource dict is built by a function in resources.py, NOT inline
in SDK services. This centralizes all K8s resource construction.

    # k8s/resources.py

    def build_app_manifest(prefix: str, domain: str, app: str,
                           created_by: str) -> dict:
        """Build the app manifest ConfigMap dict."""
        return {
            "apiVersion": "v1",
            "kind": "ConfigMap",
            "metadata": {
                "name": safe_name(prefix, "app", app),
                "labels": labels.for_app(domain, prefix, app, "app-manifest"),
                "annotations": {
                    f"{domain}/created-at": utcnow_iso(),
                    f"{domain}/created-by": created_by,
                },
            },
            "data": {
                "current_release_version": "0",
            },
        }

    def build_deployment(prefix: str, domain: str, app: str,
                         process: str, image: str, replicas: int,
                         command: str | None, env_from: str,
                         ports: list[dict] | None = None) -> dict:
        """Build a Deployment dict for a process type."""
        container = {
            "name": process,
            "image": image,
            "envFrom": [{"configMapRef": {"name": safe_name(prefix, "env", app)}},
                        {"secretRef": {"name": safe_name(prefix, "secret", app),
                                       "optional": True}}],
        }
        if command:
            container["command"] = ["/bin/sh", "-c", command]
        if ports:
            container["ports"] = ports
        return {
            "apiVersion": "apps/v1",
            "kind": "Deployment",
            "metadata": {
                "name": safe_name(prefix, app, process),
                "labels": labels.for_process(domain, prefix, app, process),
            },
            "spec": {
                "replicas": replicas,
                "selector": {"matchLabels": labels.selector(domain, app, process)},
                "template": {
                    "metadata": {"labels": labels.for_process(domain, prefix, app, process)},
                    "spec": {"containers": [container]},
                },
            },
        }

    # Similar builders for: build_env_configmap, build_formation_configmap,
    # build_secret, build_service, build_ingress, build_statefulset,
    # build_pvc, build_job, build_release_configmap

    Why this matters:
    - SDK services call build_deployment() then k8s.create_deployment(body).
      Resource construction is separate from submission.
    - All K8s resource structure is in ONE file — easy to audit, update,
      and test independently.
    - Enables future --dry-run: intercept the dict before submission.
    - Tests can assert on the built dict without calling K8s.

LABEL HELPERS — k8s/labels.py
──────────────────────────────
Don't scatter label dict construction across services. Centralize:

    # k8s/labels.py

    def for_app(domain: str, prefix: str, app: str,
                resource_type: str) -> dict[str, str]:
        """Labels for any resource belonging to an app."""
        return {
            f"{domain}/managed-by": prefix,
            f"{domain}/app": app,
            f"{domain}/resource-type": resource_type,
        }

    def for_process(domain: str, prefix: str, app: str,
                    process: str) -> dict[str, str]:
        """Labels for a process-type resource (Deployment, Service)."""
        return {
            **for_app(domain, prefix, app, "process"),
            f"{domain}/process-type": process,
        }

    def for_addon(domain: str, prefix: str, app: str,
                  instance: str, addon_type: str) -> dict[str, str]:
        """Labels for an addon resource."""
        return {
            **for_app(domain, prefix, app, "addon"),
            f"{domain}/addon-instance": instance,
            f"{domain}/addon-type": addon_type,
        }

    def selector(domain: str, app: str, process: str) -> dict[str, str]:
        """Minimal labels for Deployment selector (immutable after creation)."""
        return {
            f"{domain}/app": app,
            f"{domain}/process-type": process,
        }

    def managed_by(domain: str, prefix: str) -> dict[str, str]:
        """Labels to find ALL Kuberoku-managed resources."""
        return {f"{domain}/managed-by": prefix}

    Why: consistent label keys everywhere. One change in labels.py updates
    all resource builders. Typo in a label key is a silent, nasty bug —
    centralizing prevents this.

LABEL/ANNOTATION CONVENTION
───────────────────────────
All managed resources are tagged with these labels.
The {DOMAIN} and {PREFIX} values come from branding.py / per-cluster config
(see Section 2). Defaults shown below.

    Prefix: {DOMAIN}  (default: app.kuberoku.com)

    Label                           Value               Purpose
    ──────────────────────────────  ──────────────────  ─────────────────────
    {DOMAIN}/managed-by             {PREFIX}            Discovery (find all
                                                        managed resources)
    {DOMAIN}/app                    {app-name}          Associate resource
                                                        with an app
    {DOMAIN}/process-type           web|worker|service  Process classification
    {DOMAIN}/resource-type          app-manifest|       Resource classification
                                    env|release|addon|
                                    network-policy-deny|
                                    network-policy-allow|
                                    network-policy-external
    {DOMAIN}/addon-instance         {instance-name}     Addon instance identity
                                                        (e.g., "postgres",
                                                        "analytics-db")
    {DOMAIN}/addon-type             postgres|redis      Addon type (what software)

    Annotations:
    {DOMAIN}/created-at             ISO 8601 timestamp  When created
    {DOMAIN}/created-by             user@host           Who created
    {DOMAIN}/release-version        v{N}                Current release number
    {DOMAIN}/image                  image:tag           Current image

    CANONICAL LOCATION for current release version:
        App manifest ConfigMap ({PREFIX}-app-{name}) data key "current_release_version"
        This is the SOURCE OF TRUTH for what release an app is on.
        The {DOMAIN}/release-version annotation on Deployments is for
        visibility only (what release deployed this specific Deployment).
        These can diverge after a partial failure — `kuberoku apps:status` detects this.

    With defaults:  app.kuberoku.com/managed-by = kuberoku
    With override:  app.k8u.io/managed-by      = k8u

RESOURCE NAMING CONVENTION
──────────────────────────
All resource names use {PREFIX} from branding.py / per-cluster config.
Default {PREFIX} = "kuberoku".

    Resource            Name Pattern                    Default Example          With PREFIX=k8u
    ──────────────────  ──────────────────────────────  ─────────────────────    ────────────────
    App manifest CM     {PREFIX}-app-{app}              kuberoku-app-myapi       k8u-app-myapi
    Env ConfigMap       {PREFIX}-env-{app}              kuberoku-env-myapi       k8u-env-myapi
    Formation CM        {PREFIX}-formation-{app}        kuberoku-formation-myapi k8u-formation-myapi
    Secret              {PREFIX}-secret-{app}           kuberoku-secret-myapi    k8u-secret-myapi
    Deployment          {PREFIX}-{app}-{process}        kuberoku-myapi-web       k8u-myapi-web
    Service             {PREFIX}-{app}-{process}        kuberoku-myapi-web       k8u-myapi-web
    Ingress             {PREFIX}-ingress-{app}           kuberoku-ingress-myapi   k8u-ingress-myapi
    Release CM          {PREFIX}-release-{app}-v{N}     kuberoku-release-myapi-v3 k8u-release-myapi-v3
    Addon StatefulSet   {PREFIX}-addon-{app}-{instance}     kuberoku-addon-myapi-postgres      k8u-addon-myapi-postgres
    Addon PVC           {PREFIX}-addon-{app}-{instance}-data kuberoku-addon-myapi-postgres-data k8u-addon-myapi-postgres-data
    Addon Service       {PREFIX}-addon-{app}-{instance}     kuberoku-addon-myapi-postgres      k8u-addon-myapi-postgres
    Job (run)           {PREFIX}-run-{app}-{random}     kuberoku-run-a1b2c3      k8u-run-a1b2c3
    NetPol (deny)       {PREFIX}-netpol-deny-{app}             kuberoku-netpol-deny-myapi         k8u-netpol-deny-myapi
    NetPol (allow)      {PREFIX}-netpol-allow-{app}-{process}  kuberoku-netpol-allow-myapi-web    k8u-netpol-allow-myapi-web
    NetPol (allow addon){PREFIX}-netpol-allow-{app}-{instance} kuberoku-netpol-allow-myapi-postgres k8u-netpol-allow-myapi-postgres
    NetPol (external)   {PREFIX}-netpol-external-{app}-{process} kuberoku-netpol-external-myapi-web k8u-netpol-external-myapi-web

NAME LENGTH LIMITS & COLLISION AVOIDANCE
────────────────────────────────────────
K8s resource names are limited to 63 characters (RFC 1123 DNS label).
The pattern {PREFIX}-{app}-{process} can exceed this with long app names.

    APP NAME RULES:
    - Max 30 characters (enforced at apps:create time)
    - Lowercase alphanumeric + hyphens only
    - Must start and end with alphanumeric
    - Validated at creation: reject invalid names with clear error
    - Regex: ^[a-z0-9]([a-z0-9-]{0,28}[a-z0-9])?$

    GENERATED NAME RULES:
    - If a generated resource name exceeds 63 chars, truncate the {app}
      portion and append a stable 4-char hash suffix for uniqueness:

        {PREFIX}-{app_truncated}-{hash4}-{suffix}

        Example:
            App name: my-super-long-application-name  (30 chars)
            Full:     kuberoku-my-super-long-application-name-web  (47 chars, OK)
            Addon:    kuberoku-addon-my-super-long-application-name-postgres-data
                      → 62 chars, OK but tight

    - Hash is SHA-256 of the full untruncated app name, first 4 hex chars
    - This ensures: same app → same hash → same resource name (stable)
    - The hash is only appended when truncation is needed

    KEY RULE: Only truncate the app portion. Prefix and suffix are
    structural and must stay intact.

        Pattern:  {prefix}-{app_truncated}-{hash4}-{suffix}
        Where:    prefix = "kuberoku", "kuberoku-addon", etc.
                  suffix = "web", "worker", "postgres-data", etc.

        Example (63-char limit):
            Input:  safe_name("kuberoku", "addon", app="my-super-long-application-name",
                              suffix="postgres-data")
            Budget: 63 - len("kuberoku-addon-") - len("-postgres-data") = 63 - 15 - 14 = 34
            App fits (30 chars < 34): kuberoku-addon-my-super-long-application-name-postgres-data
            App too long: kuberoku-addon-my-super-long-applic-7f3a-postgres-data

    IMPLEMENTATION:
        # In k8s/resources.py
        def safe_name(prefix: str, app: str, suffix: str = "",
                      max_length: int = 63) -> str:
            """Build a K8s resource name, truncating only the app portion."""
            if suffix:
                full = f"{prefix}-{app}-{suffix}"
                shell = f"{prefix}--{suffix}"       # name without app
            else:
                full = f"{prefix}-{app}"
                shell = f"{prefix}-"

            if len(full) <= max_length:
                return full

            # Only the app portion gets truncated
            budget = max_length - len(shell) - 5     # 5 = "-" + 4-char hash
            hash4 = hashlib.sha256(app.encode()).hexdigest()[:4]
            app_trunc = app[:budget]
            if suffix:
                return f"{prefix}-{app_trunc}-{hash4}-{suffix}"
            return f"{prefix}-{app_trunc}-{hash4}"

CONFIGURATION RESOLUTION ORDER
───────────────────────────────
Highest priority wins:

    1. CLI flags         --app myapi, --cluster prod
    2. Environment vars  (see full list below)
    3. Project file      .kuberoku (walk up directory tree, like .git)
                         --app checks .kuberoku aliases first, then literal names
    4. Global config     ~/.kuberoku/config.yaml
    5. Defaults          Current kubeconfig context, no app

    ALL KUBEROKU_* ENVIRONMENT VARIABLES:
    ─────────────────────────────────────
    KUBEROKU_APP                 Target app (name or .kuberoku alias)
    KUBEROKU_CLUSTER             Target cluster name (from config.yaml)
    KUBEROKU_NAMESPACE           K8s namespace override
    KUBEROKU_RESOURCE_PREFIX     Resource prefix (default from branding.py)
    KUBEROKU_REGISTRY            Docker registry URL override

    These map 1:1 to CLI flags or config.yaml settings. Env vars are
    especially useful in CI/CD where there's no .kuberoku project file.

MULTI-CLUSTER SUPPORT
─────────────────────
Clusters are just named references to kubeconfig contexts:

    # ~/.kuberoku/config.yaml
    current_cluster: production
    clusters:
      production:
        context: prod-eks           # kubeconfig context name
      staging:
        context: staging-gke
      local:
        context: k3d-local

Each SDK/CLI operation targets exactly one cluster. The Kuberoku facade
instantiates the appropriate K8sClient based on the resolved cluster.

NAMESPACE STRATEGY
──────────────────
V1 DEFAULT: SINGLE SHARED NAMESPACE

All Kuberoku apps live in ONE namespace. Default: "kuberoku".
Apps are isolated by LABELS, not namespaces. This is critical for adoption:

    Why single namespace is the V1 default:
    - Most clusters won't give devs namespace create/delete permissions
    - Enterprise clusters have policies around namespace naming
    - GitOps teams manage namespace lifecycle externally
    - Kuberoku never needs "create namespaces" permission → instant adoption

    SECURITY WARNING — SHARED NAMESPACE = TRUSTED TEAM:
    Label isolation is a LOGICAL boundary, NOT a security boundary.
    Any user with RBAC access to the shared namespace can read/modify
    ConfigMaps and Secrets for ALL apps in that namespace. K8s RBAC
    cannot enforce label-level access control.

    Single namespace mode assumes a trusted team (same org, shared access).
    For multi-tenant isolation (untrusted teams, strict data separation),
    use namespace-per-app mode (V2) where each app gets its own namespace
    with separate RBAC bindings.

    kuberoku apps:create myapi
    # Creates resources in namespace "kuberoku" (or whatever is configured)
    # All resources tagged with label: app.kuberoku.com/app=myapi

CONFIGURING THE NAMESPACE AND BASE DOMAIN
──────────────────────────────────────────
    # Per-cluster in ~/.kuberoku/config.yaml
    clusters:
      production:
        context: prod-eks
        namespace: kuberoku           # default
        base_domain: apps.prod.mycompany.com   # auto-generated domains

      staging:
        context: staging-gke
        namespace: staging-apps       # custom namespace
        base_domain: apps.staging.mycompany.com

      shared:
        context: shared-cluster
        namespace: team-alpha         # team's pre-allocated namespace
        # no base_domain — manual domains:add only

    # Override via flag or env var
    kuberoku apps:create myapi --namespace my-ns
    KUBEROKU_NAMESPACE=my-ns kuberoku apps:create myapi

    Resolution order:
    1. --namespace flag
    2. KUBEROKU_NAMESPACE env var
    3. Cluster config namespace
    4. Default: "kuberoku"

    AUTOMATIC NAMESPACE CREATION (Heroku-like first-run UX):

    On first use, if the namespace doesn't exist, Kuberoku TRIES to
    create it automatically:

        $ kuberoku apps:create myapi
          Namespace "kuberoku" not found — creating it... done
          Creating app manifest... done
          Creating env config... done
          Creating formation... done
        App myapi created.

    If the user doesn't have namespace-create permission (403):

        $ kuberoku apps:create myapi
        Error: Namespace "kuberoku" not found.
        You don't have permission to create namespaces.

        Ask your cluster admin:  kubectl create namespace kuberoku
        Or use an existing one:  kuberoku clusters:add prod --namespace existing-ns

    This gives Heroku-like DX for users with permissions (most personal /
    dev clusters) and a clear actionable error for enterprise users
    without namespace-create permission.

    CLARIFICATION — "zero footprint" means no CRDs, no operators, no
    server-side pods. A namespace is just organizational metadata (like
    a directory), not "footprint." Creating one is equivalent to
    `mkdir /apps` — it's setup, not infrastructure.

HOW ISOLATION WORKS WITHOUT SEPARATE NAMESPACES
────────────────────────────────────────────────
Every resource is tagged with `{DOMAIN}/app={name}`. All queries filter by
this label. Two apps' resources never collide because:

    1. Resource names include the app name: {PREFIX}-myapi-web, {PREFIX}-other-web
    2. Label selectors always include app={name}
    3. ConfigMaps are per-app: {PREFIX}-app-myapi, {PREFIX}-app-other

    $ kubectl get all -n kuberoku -l app.kuberoku.com/app=myapi
    # Shows exactly myapi's resources

FUTURE: NAMESPACE-PER-APP MODE (V2)
────────────────────────────────────
For teams that want full namespace isolation:

    clusters:
      production:
        context: prod-eks
        namespace_mode: per-app       # creates {PREFIX}-{app} namespace per app

    How the two modes differ:

    Behavior              default mode          per-app mode (V2)
    ────────────────────  ────────────────────  ──────────────────────────
    Namespace creation    Auto-create ONE       Auto-create PER APP
                          shared namespace      ({PREFIX}-{app} each)
    RBAC required         namespace create      namespace create + delete
                          (one-time)            (ongoing, per apps:create/destroy)
    Isolation             Labels (logical)      Namespaces (K8s-enforced)
    Multi-tenant          Same team only        Separate teams possible
    RBAC per-app          Not possible          Possible (RoleBinding per NS)

    per-app mode is explicitly opt-in. It requires more RBAC permissions
    and is designed for multi-tenant clusters where teams need hard
    isolation boundaries.

OPERATION ATOMICITY — "ALL OR NOTHING" ACROSS MULTIPLE RESOURCES
─────────────────────────────────────────────────────────────────
K8s has NO transactions across resources. You cannot atomically update 3
ConfigMaps and 2 Deployments. If step 3 of 5 fails, you're in a partial
state. This is a fundamental K8s limitation.

Kuberoku handles this with three rules:

CLI OUTPUT: STEP-BY-STEP PROGRESS WITH REVERT ON FAILURE
    Every multi-step command shows each step as it happens. If a step
    fails, Kuberoku reverts the completed steps (best-effort cleanup)
    and tells you exactly what happened.

    SUCCESS — all steps complete:
        $ kuberoku apps:create myapi
          Creating app manifest... done
          Creating env config... done
          Creating formation... done
        App myapi created.

    FAILURE — step 3 fails, steps 1-2 reverted:
        $ kuberoku apps:create myapi
          Creating app manifest... done
          Creating env config... done
          Creating formation... FAILED (permission denied)
          Reverting...
            Removing env config... done
            Removing app manifest... done
        Error: Failed to create app — cannot create configmaps.

    FAILURE — revert also fails (rare, but handled):
        $ kuberoku apps:create myapi
          Creating app manifest... done
          Creating env config... done
          Creating formation... FAILED (permission denied)
          Reverting...
            Removing env config... done
            Removing app manifest... FAILED (network timeout)
        Error: Failed to create app. Partial cleanup failed.
        Orphaned resources may exist. Run: kuberoku apps:destroy myapi --force

    DEPLOY — multi-process update with revert:
        $ kuberoku deploy --image myapi:v3
          Updating web... done
          Updating worker... done
          Updating smtp... FAILED (deployment not found)
          Reverting...
            Restoring worker to myapi:v2... done
            Restoring web to myapi:v2... done
        Error: Deploy failed — smtp process type not found.
        No release created (all changes reverted).

    DEPLOY — success:
        $ kuberoku deploy --image myapi:v3
          Updating web... done
          Updating worker... done
          Waiting for rollout...
            web.1: up
            web.2: up
            worker.1: up
          Recording release v7... done
        Release v7 live.

    NOTE: Release is created LAST, after all Deployments succeed. This
    means release numbers never have "holes" — if a deploy fails, no
    release is recorded. The release log is a clean, append-only history
    of successful deploys. See "deploy ordering" below for rationale.

    The pattern for every multi-step command:
        1. Show each step with "..." while running, "done" or "FAILED" after
        2. On failure: revert completed steps in reverse order
        3. On revert failure: warn about orphaned resources, suggest fix
        4. Always exit with clear error message + actionable next step

RULE 1: EVERY OPERATION IS IDEMPOTENT
    If a command fails midway, re-running the SAME command finishes the
    job safely. It never creates duplicates or corrupts state.

    How: every step checks "does this already exist?" before creating,
    and "is this already at the desired state?" before updating.

        $ kuberoku deploy --image myapi:v3
        Updating web... done
        Updating worker... FAILED (network timeout)
        Reverting...
          Restoring web to myapi:v2... done
        Error: Deploy failed. No release created. Safe to re-run.

        $ kuberoku deploy --image myapi:v3      # safe to re-run
        Updating web... done
        Updating worker... done
        Recording release v7... done
        Release v7 live.

RULE 2: ORDER OPERATIONS TO MINIMIZE DAMAGE
    Each multi-step command is ordered so that if it fails partway, the
    system is in the least-harmful partial state:

    apps:create (3 resources):
        1. Create app manifest ConfigMap  (intent: "this app exists")
        2. Create env ConfigMap           (empty env)
        3. Create formation ConfigMap     (empty formation)
        If step 2 fails → re-run apps:create, step 1 is idempotent
        If ALL fail → nothing exists, clean state

    deploy (N deployments + 1 release):
        1. Update Deployment 1 (web)
        2. Update Deployment 2 (worker)
        3. ... (all process types)
        4. Wait for rollout healthy
        5. Create release ConfigMap LAST (records what succeeded)
        If step 1 fails → nothing changed, no release, clean state
        If step 2 fails → web on v3, worker on v2, no release recorded.
           Revert web to v2. Re-run deploy.
        If step 5 fails → Deployments updated but no release record.
           Re-run creates the release. (Rare — CM creation almost never fails.)

        WHY RELEASE IS LAST:
        A release is a RECORD of what happened, not an INTENT of what
        should happen. If deploy fails, there's no release — the history
        stays clean. Release numbers never skip (v6 → v7 → v8, no holes).
        Rollback targets a release, so every release must represent a
        state that actually existed on the cluster.

    apps:destroy (many resources):
        1. Delete Deployments (stops running pods first)
        2. Delete Services
        3. Delete ConfigMaps + Secrets (data last)
        If fails midway → re-run destroy, already-deleted resources are skipped

    config:set (1 ConfigMap update + rolling restart):
        1. Update ConfigMap (single atomic K8s operation)
        2. Trigger rolling restart (annotation patch)
        If step 2 fails → config is set, but restart didn't happen.
           User sees: "Config set. Warning: restart failed. Run ps:restart."

RULE 3: DETECT AND REPAIR INCONSISTENT STATE
    If a command fails midway and the user doesn't immediately re-run it,
    the system can detect and report the inconsistency:

    kuberoku apps:status
        Shows warnings when state is inconsistent:
        === myapi (production)
        ⚠ WARNING: Deployment images don't match latest release v7.
          web: myapi:v3 (matches release)
          worker: myapi:v2 (STALE — release says myapi:v3)
        Run `kuberoku deploy --image myapi:v3` to fix.

    kuberoku doctor
        Checks ALL apps for inconsistencies:
        - Deployments that don't match their release
        - Orphaned ConfigMaps (app manifest exists, no Deployments)
        - Formation record out of sync with actual replicas

    There is NO automatic self-healing. Kuberoku detects and REPORTS;
    the user decides when to fix. This prevents surprise side effects.

WHAT THIS MEANS IN PRACTICE
────────────────────────────
    Scenario                      What happens                 Fix
    ────────────────────────────  ───────────────────────────  ─────────────────
    deploy fails on 2nd of 3     1st process updated, rest    Re-run deploy
    process types                 still on old image

    apps:create fails midway     Partial resources created    Re-run apps:create

    config:set succeeds but      Config updated, pods on      kuberoku ps:restart
    restart fails                 old config

    apps:destroy fails midway    Some resources deleted        Re-run apps:destroy

    Network drops during         Last confirmed step holds,   Re-run the command
    any command                  rest unchanged

    In ALL cases: re-running the same command is safe and finishes the job.


CONCURRENCY & CONFLICT RULES
─────────────────────────────
K8s ConfigMaps are the source of truth. Two people running commands
simultaneously MUST NOT silently clobber each other's changes.

RULE: All ConfigMap/Secret updates use optimistic concurrency.

    How it works:
    1. Read the resource (get its metadata.resourceVersion)
    2. Modify the data
    3. Write it back with the same resourceVersion
    4. If K8s returns 409 Conflict → re-read, re-apply, retry (up to 5 times)
    5. If still conflicting after 5 retries → fail with a clear error

    This is the standard K8s pattern. The kubernetes Python client supports it
    natively via the `resource_version` field on all API objects.

    Implementation in K8sClient:
        def update_configmap(self, namespace, name, body):
            for attempt in range(MAX_RETRIES):
                try:
                    return self._api.patch_namespaced_config_map(
                        name, namespace, body
                    )
                except ApiException as e:
                    if e.status == 409 and attempt < MAX_RETRIES - 1:
                        # Re-read and retry
                        current = self._api.read_namespaced_config_map(name, namespace)
                        body["metadata"]["resourceVersion"] = current.metadata.resource_version
                        continue
                    raise

    Affected operations:
        config:set / config:unset    → ConfigMap update with retry
        deploy                       → Deployment update with retry
        ps:scale                     → Deployment patch with retry
        releases:rollback            → Deployment update with retry
        apps:maintenance:on/off      → Annotation update with retry
        addons:create/destroy        → ConfigMap update with retry

FORMATION RECORD (DESIRED STATE)
─────────────────────────────────
Each app has a formation ConfigMap that stores desired replicas AND commands
per process type:

    Resource: {PREFIX}-formation-{app}
    Contents:
    {
      "web": {
        "replicas": 3,
        "command": "gunicorn app:app --bind 0.0.0.0:8080",
        "command_source": "manual"
      },
      "worker": {
        "replicas": 2,
        "command": "celery -A myapp worker --loglevel=info",
        "command_source": "procfile"
      },
      "clock": {
        "replicas": 1,
        "command": null,
        "command_source": "default"
      }
    }

    Each process type maps to an object with:
        "replicas": int — desired replica count
        "command": str | null — container command override. null means "use image
                                default" (the Dockerfile CMD/ENTRYPOINT)
        "command_source": "manual" | "procfile" | "default" — how the command was
                          set. Used for Procfile vs ps:set precedence (see Section 4.2)

    Why "command" lives here (not in a separate ConfigMap):
        - Formation is already the "desired state" for process types
        - Commands are per-process-type, just like replicas
        - Keeping them together means one read to get the full picture
        - Rollback restores replicas AND commands atomically from the release snapshot

Why this exists:
    - apps:maintenance:on scales to 0, apps:maintenance:off restores from formation
    - rollback optionally restores the formation from the target release
    - ps:scale writes here AND patches the Deployment (single source of truth)
    - ps:set writes commands here AND patches the Deployment container command
    - If Deployment replicas diverge from formation (manual kubectl), Kuberoku
      uses formation as the "desired" state
    - If Deployment command diverges from formation, same: formation wins

Operations that read formation:
    - ps (shows desired vs actual)
    - ps:commands (shows commands per process type)
    - apps:maintenance:off (restores desired replicas)
    - releases:rollback (restores desired replicas + commands)

Operations that write formation:
    - ps:scale (updates replicas in formation + patches Deployment)
    - ps:set (updates commands in formation + patches Deployment container command)
    - apps:create (initializes with defaults: {"web": {"replicas": 1, "command": null}})
    - deploy (if Procfile present, updates commands; records formation in release snapshot)
    - deploy with Procfile (reads Procfile, updates commands in formation)

RELEASE STORAGE & LIMITS
─────────────────────────
Releases are stored as individual ConfigMaps (one per release), NOT as a
single growing ConfigMap. This avoids the ~1 MiB ConfigMap size limit.

    Resource: {PREFIX}-release-{app}-v{N}   (one ConfigMap per release)
    Contents: Compact JSON, ~200-500 bytes per release

RELEASE VERSION COUNTER (monotonic, conflict-safe)
──────────────────────────────────────────────────
    The next release number N is stored in the app manifest ConfigMap:

        Resource: {PREFIX}-app-{name}
        Data key: "current_release_version"
        Value: "42"

    Creating a release:
        1. Read app manifest ConfigMap (get resourceVersion + current version N)
        2. Increment: next = N + 1
        3. Update app manifest with next version (CAS via resourceVersion)
        4. If 409 Conflict → re-read, re-increment, retry (up to 5 times)
        5. Create release ConfigMap {PREFIX}-release-{app}-v{next}

    This prevents race conditions: if two users deploy simultaneously, one
    gets v43 and the other gets v44. The resourceVersion CAS on the app
    manifest acts as the lock — same optimistic concurrency pattern as
    all other ConfigMap updates.

    {
      "version": 42,
      "images": {
        "web": "myapi:abc123",
        "worker": "myapi-worker:abc123"
      },
      "formation": {
        "web": {"replicas": 3, "command": "gunicorn app:app --bind 0.0.0.0:8080"},
        "worker": {"replicas": 2, "command": "celery -A myapp worker"}
      },
      "commit": "abc123def456789",
      "ref": "main",
      "created_at": "2025-02-09T15:30:01Z",
      "created_by": "user@host",
      "description": "Deploy abc1234 (main)",
      "env_diff": {"NEW_VAR": "value", "OLD_VAR": null}
    }

    Key design decisions:
    - "images" stores ALL process type images, not just the one changed.
      Even if only "web" was deployed, the release records the full state.
      This makes rollback unambiguous — you restore the ENTIRE image set.
    - "formation" snapshots replica counts AND commands at release time.
      Rollback restores both replicas and commands atomically.
    - "command" is null when using the image default (no override).
    - "commit" + "ref" — the git commit SHA and ref used (null if --image).
      Enables full traceability: release → commit → code.
    - env_diff only stores the DELTA, not the full env. No bloat.
    - A deploy that only changes web image still records worker's current image.

RELEASE PRUNING
───────────────
Old releases accumulate. Kuberoku provides explicit pruning:

    kuberoku releases:prune              Prune old releases
        --keep INT                       Keep this many (default: 50)
        --older-than TEXT                Delete releases older than (e.g., "30d")

    Auto-prune: after creating a new release, if total releases > 100,
    automatically delete the oldest ones to keep 100. This is a background
    cleanup, not blocking.

    Users can also set a per-app retention policy:
        kuberoku config:set KUBEROKU_RELEASE_RETENTION=50


================================================================================
 10. DATA MODELS
================================================================================

All models are frozen dataclasses (immutable after creation).

    from dataclasses import dataclass
    from datetime import datetime


    @dataclass(frozen=True)
    class ProcessFormation:
        replicas: int
        command: str | None                 # None = use image default
        command_source: str                 # "manual" | "procfile" | "default"

    @dataclass(frozen=True)
    class App:
        name: str
        cluster: str
        created_at: datetime
        created_by: str
        process_types: dict[str, ProcessFormation]  # {"web": ProcessFormation(2, "gunicorn...")}
        release_version: int
        maintenance: bool


    @dataclass(frozen=True)
    class Dyno:
        name: str                           # "web.1", "worker.2" (presentation-only)
        app: str
        process_type: str                   # "web", "worker", "service"
        command: str | None                 # command override (None = image default)
        state: str                          # "up", "starting", "crashed", "down"
        started_at: datetime | None
        image: str
        node: str                           # K8s node name

    DYNO STATE ← K8s POD STATUS MAPPING:
        K8s Pod Phase/Condition              Dyno State    Shown As
        ─────────────────────────────────    ──────────    ──────────────────
        Running + Ready                      "up"          up (running for 45s)
        Pending / ContainerCreating          "starting"    starting
        CrashLoopBackOff                     "crashed"     crashed (CrashLoopBackOff)
        ImagePullBackOff                     "crashed"     crashed (ImagePullBackOff)
        OOMKilled                            "crashed"     crashed (OOMKilled)
        Failed                               "crashed"     crashed
        Succeeded (Job)                      "down"        completed
        Deployment scaled to 0               "down"        down (scaled to 0)


    @dataclass(frozen=True)
    class Release:
        version: int                        # 1, 2, 3, ...
        app: str
        description: str                    # "Deploy myapi:v2", "Set DATABASE_URL"
        images: dict[str, str]              # {"web": "myapi:v3", "worker": "myapi-worker:v3"}
                                            # ALL process type images, not just the one changed
        formation: dict[str, ProcessFormation]  # full formation snapshot (replicas + commands)
        commit: str | None                  # git commit SHA (None if deployed via --image)
        ref: str | None                     # git ref used ("main", "v1.2.0", None if --image)
        created_at: datetime
        created_by: str
        env_diff: dict[str, str | None]     # {"KEY": "new_val"} or {"KEY": None} for unset


    @dataclass(frozen=True)
    class Addon:
        instance_name: str                  # "postgres", "analytics-db", "sessions-cache"
        addon_type: str                     # "postgres", "redis"
        app: str
        status: str                         # "running", "creating", "failed"
        image: str                          # "postgres:16", "redis:7"
        cpu: str                            # "250m"
        memory: str                         # "256Mi"
        storage: str | None                 # "1Gi" (None for ephemeral)
        env_key: str                        # "DATABASE_URL", "REDIS_URL"
        external: bool                      # True = external URL, no StatefulSet
        ephemeral: bool                     # True = no PVC
        created_at: datetime
        # NOTE: No "plan" field — Kuberoku uses direct resources, not plans.
        # Users set cpu/memory/storage directly. See "NO PLANS" section.


    @dataclass(frozen=True)
    class Domain:
        hostname: str                       # "myapp.com"
        app: str
        tls: bool                           # cert-manager active
        status: str                         # "active", "pending"
        created_at: datetime


    @dataclass(frozen=True)
    class LogLine:
        timestamp: datetime
        source: str                         # "app" or "system"
        dyno: str                           # "web.1"
        message: str


    @dataclass(frozen=True)
    class ClusterInfo:
        name: str                           # "production"
        context: str                        # kubeconfig context
        is_current: bool
        k8s_version: str | None             # "1.35.0"
        node_count: int | None
        app_count: int | None


    @dataclass(frozen=True)
    class ExposedEndpoint:
        """Result of services:expose:on/off or addons:expose:on/off."""
        name: str                           # "web", "smtp", "postgres"
        app: str
        kind: str                           # "process" or "addon"
        service_type: str                   # "LoadBalancer", "NodePort", "ClusterIP"
        method: str                         # "loadbalancer", "nodeport", "clusterip"
        external_ip: str | None             # IP after LoadBalancer provisioned
        hostname: str | None                # cloud hostname (ELB, etc.)
        ports: tuple[int, ...]              # exposed port numbers


    @dataclass(frozen=True)
    class PortMapping:
        """A port on a process type's Service."""
        port: int                           # port number (e.g. 8080, 25, 587)
        protocol: str                       # "TCP" or "UDP"


    @dataclass(frozen=True)
    class ConnectionTarget:
        """Info returned by addons:connect — used by CLI to run port-forward."""
        name: str                           # "web", "postgres"
        app: str
        kind: str                           # "process" or "addon"
        service_name: str                   # actual K8s Service name
        namespace: str                      # K8s namespace
        ports: tuple[PortMapping, ...]      # ports available for forwarding


    # NOTE: ServiceInfo is not a separate model. Service state is derived from
    # ExposedEndpoint (expose status) and PortMapping (declared ports).
    # Use k.services.expose_on/off() and k.services.list_ports() instead.


================================================================================
 11. EXCEPTION HIERARCHY
================================================================================

    class KuberokuError(Exception):
        """Base exception for all Kuberoku errors."""
        pass

    # App errors
    class AppNotFoundError(KuberokuError):
        """Raised when an app doesn't exist."""
        def __init__(self, name: str):
            super().__init__(f"App '{name}' not found.")
            self.name = name

    class AppAlreadyExistsError(KuberokuError):
        """Raised when creating an app that already exists."""
        def __init__(self, name: str):
            super().__init__(f"App '{name}' already exists.")
            self.name = name

    # Deploy errors
    class DeployError(KuberokuError):
        """Raised when a deployment fails."""
        pass

    class RolloutTimeoutError(DeployError):
        """Raised when a rollout doesn't complete in time."""
        pass

    class ImageNotFoundError(DeployError):
        """Raised when the specified image can't be pulled."""
        pass

    # Config errors
    class ConfigVarNotFoundError(KuberokuError):
        """Raised when a config var doesn't exist."""
        pass

    # Release errors
    class ReleaseNotFoundError(KuberokuError):
        """Raised when a release version doesn't exist."""
        pass

    # Cluster errors
    class ClusterNotFoundError(KuberokuError):
        """Raised when a cluster name isn't configured."""
        pass

    class ClusterUnreachableError(KuberokuError):
        """Raised when the K8s API server can't be reached."""
        pass

    # Permission errors
    class PermissionDeniedError(KuberokuError):
        """Raised when the K8s user lacks required RBAC permissions."""
        def __init__(self, verb: str, resource: str, suggestion: str = ""):
            msg = f"Permission denied: cannot {verb} {resource}."
            if suggestion:
                msg += f"\n{suggestion}"
            super().__init__(msg)
            self.verb = verb
            self.resource = resource

    # Addon errors
    class AddonNotFoundError(KuberokuError):
        """Raised when an addon doesn't exist."""
        pass

    class AddonTypeNotSupportedError(KuberokuError):
        """Raised when an unsupported addon type is requested."""
        pass

    # Domain errors
    class DomainAlreadyExistsError(KuberokuError):
        """Raised when a domain is already configured."""
        pass

    # Plugin errors
    class PluginError(KuberokuError):
        """Raised when a plugin fails to load."""
        pass

The CLI catches KuberokuError at the top level and prints a friendly message.
Non-KuberokuError exceptions are unexpected bugs and show a traceback.


================================================================================
 12. TESTING STRATEGY
================================================================================

PRINCIPLE
─────────
Kuberoku talks to Kubernetes. K8s has versions, API deprecations, and
behavioral quirks. The testing strategy must guarantee:

    1. Every command works (happy path + error paths)
    2. Every K8s version we claim to support actually works
    3. New commands are testable without a real cluster (fast dev loop)
    4. Old resources created by older Kuberoku versions still work
    5. Multi-instance, multi-process, multi-addon scenarios are covered

TESTING PYRAMID
───────────────
    70% Unit tests       FakeK8sClient, no cluster, fast (<5s total)
    20% Integration      k3d cluster, real K8s API
    10% E2E              Full CLI invocation via CliRunner

    Why this ratio:
    - Unit tests (70%): Every SDK method, every error path, every edge
      case. Use FakeK8sClient. Run in <5s. Developers run these 100x/day.
    - Integration (20%): Prove that real K8s behaves like FakeK8sClient.
      Run against k3d. Catch API version issues, RBAC edge cases,
      label selector bugs that the fake doesn't model.
    - E2E (10%): Full CLI flows. Prove that the CLI → SDK → K8s pipeline
      works end-to-end. Catch argument parsing, output formatting,
      exit code issues.


TEST DIRECTORY STRUCTURE — FEATURE-BASED, COMMAND-PROXIMATE
───────────────────────────────────────────────────────────
Tests are organized by FEATURE, not by layer. Each directory = one command
group. Each file = one command. SDK + CLI tests live in the SAME file.

    WHY: If you delete a command, you delete its test file. No orphans.
    If you add a command, the test file goes in the obvious directory.
    You never have to wonder "is there a test for this?" — it's right
    next to its siblings.

    OLD (layer-based — rejected):
        tests/commands/test_apps.py     # CLI tests for ALL apps:* commands
        tests/sdk/test_apps_service.py  # SDK tests for AppsService
        → Two files in different directories. Easy to forget one. If you
          remove apps:rename from CLI, you might forget the SDK test.

    NEW (feature-based — chosen):
        tests/apps/test_create.py       # SDK + CLI tests for apps:create
        tests/apps/test_rename.py       # SDK + CLI tests for apps:rename
        → One file = one command = one mental unit.

    tests/
    ├── conftest.py                  # Backend matrix (k8s fixture), markers,
    │                                # namespace isolation, CLI runner, factory
    ├── backends/                    # Test backend implementations
    │   ├── __init__.py
    │   ├── fake.py                  # FakeK8sClient (in-memory)
    │   └── real.py                  # RealK8sClient (wraps K8sClient, normalizes errors)
    │
    │   # ── FEATURE TESTS ──────────────────────────────────
    │   # Each directory = one command group / feature.
    │   # Each file = one command. Both SDK + CLI tests inside.
    │   # conftest.py per directory for shared fixtures.
    │
    ├── apps/                        # apps:* tests
    │   ├── conftest.py              # Shared: sample_app fixture
    │   ├── test_create.py           # apps:create (SDK + CLI)
    │   ├── test_destroy.py          # apps:destroy
    │   ├── test_info.py             # apps:info
    │   ├── test_list.py             # apps (list)
    │   ├── test_rename.py           # apps:rename
    │   └── test_link.py             # apps:link:add / apps:link:remove
    │
    ├── config/                      # config:* tests
    │   ├── test_set.py              # config:set
    │   ├── test_get.py              # config:get
    │   └── test_unset.py            # config:unset
    │
    ├── deploy/                      # deploy + releases tests
    │   ├── conftest.py              # Shared: deployed_app fixture
    │   ├── test_image.py            # deploy --image
    │   ├── test_git.py              # deploy (build from git)
    │   ├── test_releases_list.py    # releases
    │   ├── test_releases_info.py    # releases:info
    │   └── test_rollback.py         # releases:rollback
    │
    ├── ps/                          # ps:* tests
    │   ├── test_list.py             # ps
    │   ├── test_scale.py            # ps:scale
    │   ├── test_set.py              # ps:set (process commands)
    │   └── test_restart.py          # ps:restart
    │
    ├── services/                    # services:* tests
    │   ├── test_expose.py           # services:expose:on
    │   └── test_unexpose.py         # services:expose:off
    │
    ├── ports/                       # services:ports:* tests
    │   ├── test_ports_list.py       # services:ports
    │   ├── test_ports_add.py        # services:ports:add
    │   └── test_ports_remove.py     # services:ports:remove
    │
    ├── networking/                  # shared networking tests
    │   └── test_network_policy.py   # NetworkPolicy lifecycle (Section 8.2)
    │
    ├── addons/                      # addons:* tests (incl. expose:on/off/connect)
    │   ├── conftest.py              # Shared: addon fixture
    │   ├── test_create.py           # addons:create (incl. multi-instance)
    │   ├── test_destroy.py          # addons:destroy (--delete-data)
    │   ├── test_info.py
    │   └── test_list.py
    │
    ├── domains/
    │   ├── conftest.py              # Shared: domain fixture
    │   ├── test_add.py
    │   ├── test_remove.py
    │   ├── test_list.py
    │   ├── test_clear.py            # domains:clear
    │   ├── test_auto_domain.py      # auto-domain assignment
    │   └── test_cli_domains.py      # CLI integration for domains
    │   # NOTE: run/exec and logs tests live in tests/services/ (test_exec.py,
    │   #   test_logs.py, test_logs_follow.py). No separate run/ or logs/ dirs.
    │
    ├── maintenance/                 # (Phase 9.7)
    │   ├── test_on.py
    │   └── test_off.py
    │
    ├── clusters/
    │   ├── test_add.py
    │   └── test_list.py
    │
    │   # ── INFRASTRUCTURE TESTS ───────────────────────────
    │   # Non-command tests: internal modules, helpers, models.
    │
    ├── infrastructure/
    │   ├── test_labels.py           # k8s/labels.py
    │   ├── test_resources.py        # k8s/resources.py (builders)
    │   ├── test_safe_name.py        # 63-char limit, truncation + hash
    │   ├── test_branding.py         # All constants derive from TOOL_NAME
    │   ├── test_models.py           # Dataclass construction, validation
    │   ├── test_factory.py          # Factory wiring, lazy K8s
    │   ├── test_context.py          # App resolution chain
    │   ├── test_output.py           # TTY detection, JSON/table, no-color
    │   ├── test_colon_commands.py   # ColonCommandGroup resolve_command()
    │   ├── test_help.py             # --help for every command
    │   ├── test_exit_codes.py       # 0/1/2 exit codes
    │   └── test_startup.py          # <100ms startup time benchmark
    │
    │   # ── CROSS-CUTTING TESTS ────────────────────────────
    │
    ├── contract/                    # Proves FakeK8sClient ≈ real K8s
    │   ├── test_fake_matches_real.py
    │   └── test_network_policy_crud.py  # NetworkPolicy CRUD contract
    │
    ├── flows/                       # Multi-command user journeys
    │   ├── test_web_app_flow.py     # create → config → deploy → rollback
    │   ├── test_smtp_flow.py        # multi-port TCP, expose, services:ports:add
    │   ├── test_multi_addon_flow.py # 3 postgres + 2 redis
    │   ├── test_mixed_app_flow.py   # HTTP + TCP + addons + domains
    │   ├── test_failure_recovery.py # deploy fail → re-run → success
    │   └── test_config_flow.py      # bulk set, --secret, unset, releases
    │
    ├── compat/                      # Backward compatibility
    │   ├── test_v010_resources.py   # Current code reads v0.1.0 format
    │   └── ...                      # One per version with format changes
    │
    └── fixtures/                    # Saved K8s resource snapshots
        ├── v0.1.0/
        │   ├── app_configmap.json
        │   ├── env_configmap.json
        │   ├── formation_configmap.json
        │   ├── release_configmap.json
        │   └── deployment.json
        └── ...

    KEY INSIGHTS:
    - Feature test files use the `k8s` fixture from conftest.py, which is
      parameterized over active backends automatically.
    - Each test file tests BOTH SDK and CLI layers of one command.
      SDK tests call the service directly. CLI tests use CliRunner.
    - If a test needs real pods (e.g., logs --follow), mark with
      @pytest.mark.needs("pods") — auto-skipped on fake backend.

    EXAMPLE — HOW ONE COMMAND TEST FILE WORKS:
    ───────────────────────────────────────────
    # tests/apps/test_create.py

    """Tests for apps:create — both SDK and CLI layers."""

    import pytest
    from click.testing import CliRunner
    from kuberoku.cli.main import cli
    from kuberoku.exceptions import AppAlreadyExistsError


    # ── SDK LAYER ───────────────────────────────────────

    class TestCreateSDK:
        """SDK-level: call AppsService.create() directly."""

        def test_create_app(self, factory):
            app = factory.apps.create("myapi")
            assert app.name == "myapi"
            # Verify K8s resources created
            cm = factory.k8s.get_configmap(
                factory.namespace, "kuberoku-app-myapi"
            )
            assert cm is not None

        def test_create_duplicate_raises(self, factory):
            factory.apps.create("myapi")
            with pytest.raises(AppAlreadyExistsError):
                factory.apps.create("myapi")

        def test_create_invalid_name_raises(self, factory):
            with pytest.raises(InvalidAppNameError):
                factory.apps.create("UPPERCASE-BAD")

        def test_create_sets_labels(self, factory):
            factory.apps.create("myapi")
            cm = factory.k8s.get_configmap(
                factory.namespace, "kuberoku-app-myapi"
            )
            assert cm["metadata"]["labels"]["app.kuberoku.com/app"] == "myapi"
            assert cm["metadata"]["labels"]["app.kuberoku.com/managed-by"] == "kuberoku"


    # ── CLI LAYER ───────────────────────────────────────

    class TestCreateCLI:
        """CLI-level: full Click invocation via CliRunner."""

        def test_create_app(self, cli_runner):
            result = cli_runner.invoke(cli, ["apps:create", "myapi"])
            assert result.exit_code == 0
            assert "myapi created" in result.output

        def test_create_duplicate_shows_error(self, cli_runner):
            cli_runner.invoke(cli, ["apps:create", "myapi"])
            result = cli_runner.invoke(cli, ["apps:create", "myapi"])
            assert result.exit_code == 1
            assert "already exists" in result.output

        def test_create_json_output(self, cli_runner):
            result = cli_runner.invoke(
                cli, ["apps:create", "myapi", "--output", "json"]
            )
            assert result.exit_code == 0
            data = json.loads(result.output)
            assert data["name"] == "myapi"

    # When run with --backend fake:
    #   TestCreateSDK.test_create_app         → fake ✓
    #   TestCreateSDK.test_create_duplicate   → fake ✓
    #   TestCreateCLI.test_create_app         → fake ✓
    # When run with --backend real (k3d/kind/colima/EKS):
    #   TestCreateSDK.test_create_app         → real ✓
    #   TestCreateSDK.test_create_duplicate   → real ✓
    #   TestCreateCLI.test_create_app         → real ✓
    #   (all tests run on both backends — no @needs marker)

    WHY SDK + CLI IN THE SAME FILE:
    ────────────────────────────────
    - When you work on apps:create, ALL its tests are in ONE place
    - SDK tests verify business logic. CLI tests verify user-facing output.
    - If you change the SDK method signature, you immediately see if the
      CLI test still passes — same file, same pytest run.
    - If you delete apps:create, you delete test_create.py. Done.

    CONFTEST FIXTURES PER FEATURE DIRECTORY:
    ────────────────────────────────────────
    # tests/apps/conftest.py

    @pytest.fixture
    def sample_app(factory):
        """Pre-created app for tests that need one."""
        return factory.apps.create("test-app")

    # tests/deploy/conftest.py

    @pytest.fixture
    def deployed_app(factory):
        """App with one deploy for tests that need a deployed app."""
        factory.apps.create("test-app")
        factory.deploy.deploy("test-app", image="nginx:latest")
        return "test-app"

    These fixtures keep test setup DRY within a feature. Each feature
    directory is self-contained — shared fixtures live in its conftest.py,
    not in the root conftest.py.

    EXAMPLE — HOW A FLOW TEST WORKS:
    ─────────────────────────────────
    # tests/flows/test_web_app_flow.py

    def test_basic_web_app_lifecycle(factory):
        """Full lifecycle: create → config → deploy → rollback → destroy.
        Uses the factory — same wiring as production CLI."""

        # Create
        app = factory.apps.create("myapi")
        assert app.name == "myapi"

        # Config
        factory.config.set("myapi", {"DATABASE_URL": "postgres://localhost/test"})

        # Deploy v1
        r1 = factory.deploy.deploy("myapi", image="nginx:latest")
        assert r1.version == 1

        # Deploy v2
        r2 = factory.deploy.deploy("myapi", image="nginx:1.25")
        assert r2.version == 2

        # Rollback to v1
        r3 = factory.releases.rollback("myapi", 1)
        assert r3.version == 3  # rollback creates NEW release

        # Release history is append-only
        all_releases = factory.releases.list("myapi")
        assert len(all_releases) == 3

        # Destroy
        factory.apps.destroy("myapi")

        # Verify no orphaned resources
        all_cms = k8s.list_configmaps(test_namespace,
            labels={"app.kuberoku.com/app": "myapi"})
        assert len(all_cms) == 0

    @pytest.mark.needs("pods")
    def test_web_app_with_real_pods(k8s, test_namespace):
        """Same flow, but verify pods actually start."""
        # This runs on k3d/kind/real only
        ...

    EXAMPLE — HOW A CLI TEST WORKS:
    ────────────────────────────────
    # tests/cli/test_colon_commands.py
    # These DON'T use the k8s fixture — they test CLI parsing only

    from click.testing import CliRunner
    from kuberoku.cli.main import app

    def test_colon_and_space_both_work(runner):
        r1 = runner.invoke(app, ["apps:create", "test1"])
        r2 = runner.invoke(app, ["apps", "create", "test2"])
        assert r1.exit_code == 0
        assert r2.exit_code == 0

    def test_error_suggests_fix(runner):
        result = runner.invoke(app, ["config:set", "KEY=val"])
        # No app linked → should suggest --app
        assert result.exit_code == 1
        assert "Error:" in result.output
        assert "--app" in result.output


FAKE K8S CLIENT
───────────────
    class FakeK8sClient:
        """In-memory K8s client implementing K8sClientProtocol.

        - Stores resources in dicts keyed by (namespace, name)
        - Supports label selector matching
        - Tracks all method calls in self.call_log for test assertions
        - Models resourceVersion for optimistic concurrency
        - Models 404 for get-nonexistent, 409 for stale updates
        """

        def __init__(self):
            self.namespaces: dict[str, dict] = {}
            self.deployments: dict[tuple[str, str], dict] = {}
            self.configmaps: dict[tuple[str, str], dict] = {}
            self.secrets: dict[tuple[str, str], dict] = {}
            self.services: dict[tuple[str, str], dict] = {}
            self.ingresses: dict[tuple[str, str], dict] = {}
            self.statefulsets: dict[tuple[str, str], dict] = {}
            self.pvcs: dict[tuple[str, str], dict] = {}
            self.pods: dict[tuple[str, str], dict] = {}
            self.jobs: dict[tuple[str, str], dict] = {}
            self.hpas: dict[tuple[str, str], dict] = {}
            self.call_log: list[tuple[str, tuple, dict]] = []

        def create_deployment(self, namespace, body):
            name = body["metadata"]["name"]
            self.deployments[(namespace, name)] = body
            self.call_log.append(("create_deployment", (namespace,), {"body": body}))
            return body

        def list_deployments(self, namespace, labels=None):
            results = [d for (ns, _), d in self.deployments.items() if ns == namespace]
            if labels:
                results = [d for d in results if self._labels_match(d, labels)]
            return results

        # OPTIMISTIC CONCURRENCY SIMULATION
        # FakeK8sClient MUST model resourceVersion for ConfigMaps and
        # Deployments. Without this, tests won't cover the most important
        # correctness behavior (conflict detection + retry).
        _version_counter: int = 0

        def _next_version(self) -> str:
            self._version_counter += 1
            return str(self._version_counter)

        def create_configmap(self, namespace, body):
            name = body["metadata"]["name"]
            body.setdefault("metadata", {})["resourceVersion"] = self._next_version()
            self.configmaps[(namespace, name)] = body
            return body

        def update_configmap(self, namespace, name, body):
            existing = self.configmaps.get((namespace, name))
            if existing and body["metadata"].get("resourceVersion") != existing["metadata"]["resourceVersion"]:
                raise FakeConflictError(f"409 Conflict on {name}")  # simulate 409
            body["metadata"]["resourceVersion"] = self._next_version()
            self.configmaps[(namespace, name)] = body
            return body

        # WHAT THE FAKE MUST MODEL:
        # ─────────────────────────
        # 1. resourceVersion on create/update (optimistic concurrency)
        # 2. 404 NotFound when get/update/delete a non-existent resource
        # 3. 409 Conflict when update with stale resourceVersion
        # 4. 409 AlreadyExists when create a resource that exists
        # 5. Label selector filtering on list operations
        # 6. Namespace scoping (resources in wrong namespace = invisible)
        #
        # WHAT THE FAKE DOES NOT MODEL (tested in integration only):
        # ──────────────────────────────────────────────────────────
        # - Actual pod scheduling / container runtime
        # - Rolling update behavior (readiness probes, surge, etc.)
        # - LoadBalancer IP allocation
        # - Ingress controller behavior
        # - cert-manager certificate issuance
        # - PVC binding to PV
        # - RBAC enforcement
        # - Resource quotas / limit ranges


CONTRACT TESTS — PROVING THE FAKE IS HONEST
────────────────────────────────────────────
The FakeK8sClient is the foundation of the entire test suite. If it
lies — if it behaves differently from real K8s — unit tests pass but
production breaks. Contract tests prevent this.

    # tests/contract/test_fake_matches_real.py
    #
    # This file runs the SAME test cases against both FakeK8sClient
    # and real K8sClient (k3d). If any test passes on fake but fails
    # on real, the fake has a bug.

    @pytest.fixture(params=["fake", "real"])
    def client(request):
        if request.param == "fake":
            return FakeK8sClient()
        elif request.param == "real":
            pytest.importorskip("k3d")  # skip if no cluster
            return K8sClient(context="k3d-test")

    def test_create_then_get_configmap(client):
        body = {"metadata": {"name": "test-cm"}, "data": {"key": "val"}}
        client.create_configmap("default", body)
        result = client.get_configmap("default", "test-cm")
        assert result["data"]["key"] == "val"
        assert "resourceVersion" in result["metadata"]

    def test_update_with_stale_version_raises_409(client):
        body = {"metadata": {"name": "test-cm"}, "data": {"key": "v1"}}
        created = client.create_configmap("default", body)
        stale = copy.deepcopy(created)

        # Update once (advances resourceVersion)
        created["data"]["key"] = "v2"
        client.update_configmap("default", "test-cm", created)

        # Try updating with stale version → 409
        stale["data"]["key"] = "v3"
        with pytest.raises((FakeConflictError, ApiException)):
            client.update_configmap("default", "test-cm", stale)

    def test_get_nonexistent_raises_404(client):
        with pytest.raises((FakeNotFoundError, ApiException)):
            client.get_configmap("default", "does-not-exist")

    def test_label_selector_filters(client):
        cm1 = {"metadata": {"name": "cm1", "labels": {"app": "a"}}, "data": {}}
        cm2 = {"metadata": {"name": "cm2", "labels": {"app": "b"}}, "data": {}}
        client.create_configmap("default", cm1)
        client.create_configmap("default", cm2)
        results = client.list_configmaps("default", labels={"app": "a"})
        assert len(results) == 1
        assert results[0]["metadata"]["name"] == "cm1"

    # Contract tests run in integration CI only (need k3d).
    # If a contract test ever fails on real but passes on fake,
    # the fix goes in FakeK8sClient — never in the contract test.

    EXPANDED CONTRACT TEST SUITE:
    ─────────────────────────────
    The contract tests are the FOUNDATION of the entire testing pyramid.
    Every behavior the FakeK8sClient claims to model MUST have a contract
    test proving it matches real K8s. This is NOT optional — if the fake
    lies, every unit test that passes is a false positive.

    # ── CRUD OPERATIONS ──────────────────────────────────────

    def test_create_then_get_configmap(client):
        """Create → get returns the same data."""
        ...  # (shown above)

    def test_create_then_get_deployment(client):
        body = build_deployment("test-deploy", image="nginx:latest")
        client.create_deployment("default", body)
        result = client.get_deployment("default", "test-deploy")
        assert result["spec"]["template"]["spec"]["containers"][0]["image"] == "nginx:latest"
        assert "resourceVersion" in result["metadata"]

    def test_create_then_get_secret(client):
        body = {"metadata": {"name": "test-secret"}, "data": {"key": b64encode(b"val")}}
        client.create_secret("default", body)
        result = client.get_secret("default", "test-secret")
        assert result["data"]["key"] == b64encode(b"val")

    def test_create_then_list_returns_item(client):
        body = {"metadata": {"name": "cm1"}, "data": {"k": "v"}}
        client.create_configmap("default", body)
        results = client.list_configmaps("default")
        assert any(cm["metadata"]["name"] == "cm1" for cm in results)

    def test_create_then_delete_then_get_raises_404(client):
        body = {"metadata": {"name": "cm1"}, "data": {}}
        client.create_configmap("default", body)
        client.delete_configmap("default", "cm1")
        with pytest.raises((FakeNotFoundError, ApiException)):
            client.get_configmap("default", "cm1")

    def test_create_then_delete_then_list_is_empty(client):
        body = {"metadata": {"name": "cm1"}, "data": {}}
        client.create_configmap("default", body)
        client.delete_configmap("default", "cm1")
        results = client.list_configmaps("default")
        assert not any(cm["metadata"]["name"] == "cm1" for cm in results)

    # ── CONFLICT DETECTION (optimistic concurrency) ──────────

    def test_update_with_stale_version_raises_409(client):
        """409 Conflict when resourceVersion doesn't match."""
        ...  # (shown above)

    def test_update_with_current_version_succeeds(client):
        body = {"metadata": {"name": "cm1"}, "data": {"k": "v1"}}
        created = client.create_configmap("default", body)
        created["data"]["k"] = "v2"
        updated = client.update_configmap("default", "cm1", created)
        assert updated["data"]["k"] == "v2"
        assert updated["metadata"]["resourceVersion"] != created["metadata"]["resourceVersion"]

    def test_create_existing_raises_409(client):
        """409 AlreadyExists when creating a resource that already exists."""
        body = {"metadata": {"name": "cm1"}, "data": {}}
        client.create_configmap("default", body)
        with pytest.raises((FakeConflictError, ApiException)):
            client.create_configmap("default", body)

    def test_resource_version_increments_on_every_update(client):
        body = {"metadata": {"name": "cm1"}, "data": {"k": "v1"}}
        c = client.create_configmap("default", body)
        rv1 = c["metadata"]["resourceVersion"]
        c["data"]["k"] = "v2"
        c = client.update_configmap("default", "cm1", c)
        rv2 = c["metadata"]["resourceVersion"]
        c["data"]["k"] = "v3"
        c = client.update_configmap("default", "cm1", c)
        rv3 = c["metadata"]["resourceVersion"]
        assert rv1 != rv2 != rv3  # strictly increasing

    # ── 404 NOT FOUND ────────────────────────────────────────

    def test_get_nonexistent_raises_404(client):
        ...  # (shown above)

    def test_update_nonexistent_raises_404(client):
        body = {"metadata": {"name": "nope", "resourceVersion": "1"}, "data": {}}
        with pytest.raises((FakeNotFoundError, ApiException)):
            client.update_configmap("default", "nope", body)

    def test_delete_nonexistent_raises_404(client):
        with pytest.raises((FakeNotFoundError, ApiException)):
            client.delete_configmap("default", "nope")

    # ── LABEL SELECTORS ──────────────────────────────────────

    def test_label_selector_filters(client):
        """List with label selector returns only matching resources."""
        ...  # (shown above)

    def test_label_selector_multiple_labels_AND(client):
        """Multiple labels in selector = AND (all must match)."""
        cm1 = {"metadata": {"name": "cm1", "labels": {"app": "a", "env": "prod"}}, "data": {}}
        cm2 = {"metadata": {"name": "cm2", "labels": {"app": "a", "env": "staging"}}, "data": {}}
        cm3 = {"metadata": {"name": "cm3", "labels": {"app": "b", "env": "prod"}}, "data": {}}
        client.create_configmap("default", cm1)
        client.create_configmap("default", cm2)
        client.create_configmap("default", cm3)
        results = client.list_configmaps("default", labels={"app": "a", "env": "prod"})
        assert len(results) == 1
        assert results[0]["metadata"]["name"] == "cm1"

    def test_label_selector_no_match_returns_empty(client):
        cm = {"metadata": {"name": "cm1", "labels": {"app": "a"}}, "data": {}}
        client.create_configmap("default", cm)
        results = client.list_configmaps("default", labels={"app": "nonexistent"})
        assert len(results) == 0

    def test_label_selector_no_labels_on_resource(client):
        """Resource with no labels is never returned by label selector."""
        cm = {"metadata": {"name": "cm1"}, "data": {}}
        client.create_configmap("default", cm)
        results = client.list_configmaps("default", labels={"app": "a"})
        assert len(results) == 0

    def test_list_no_selector_returns_all(client):
        """List without label selector returns everything in namespace."""
        cm1 = {"metadata": {"name": "cm1"}, "data": {}}
        cm2 = {"metadata": {"name": "cm2"}, "data": {}}
        client.create_configmap("default", cm1)
        client.create_configmap("default", cm2)
        results = client.list_configmaps("default")
        names = [cm["metadata"]["name"] for cm in results]
        assert "cm1" in names and "cm2" in names

    # ── NAMESPACE SCOPING ────────────────────────────────────

    def test_namespace_isolation(client):
        """Resources in different namespaces are invisible to each other."""
        body_a = {"metadata": {"name": "cm1"}, "data": {"env": "a"}}
        body_b = {"metadata": {"name": "cm1"}, "data": {"env": "b"}}
        client.create_namespace("ns-a")
        client.create_namespace("ns-b")
        client.create_configmap("ns-a", body_a)
        client.create_configmap("ns-b", body_b)
        result_a = client.get_configmap("ns-a", "cm1")
        result_b = client.get_configmap("ns-b", "cm1")
        assert result_a["data"]["env"] == "a"
        assert result_b["data"]["env"] == "b"

    def test_list_in_wrong_namespace_returns_empty(client):
        body = {"metadata": {"name": "cm1"}, "data": {}}
        client.create_namespace("ns-a")
        client.create_namespace("ns-b")
        client.create_configmap("ns-a", body)
        results = client.list_configmaps("ns-b")
        assert not any(cm["metadata"]["name"] == "cm1" for cm in results)

    def test_delete_in_wrong_namespace_raises_404(client):
        body = {"metadata": {"name": "cm1"}, "data": {}}
        client.create_namespace("ns-a")
        client.create_namespace("ns-b")
        client.create_configmap("ns-a", body)
        with pytest.raises((FakeNotFoundError, ApiException)):
            client.delete_configmap("ns-b", "cm1")

    # ── MULTI-RESOURCE TYPE (parametrized) ───────────────────

    @pytest.mark.parametrize("resource", [
        ("configmap", "create_configmap", "get_configmap",
         "update_configmap", "delete_configmap"),
        ("deployment", "create_deployment", "get_deployment",
         "update_deployment", "delete_deployment"),
        ("secret", "create_secret", "get_secret",
         "update_secret", "delete_secret"),
        ("service", "create_service", "get_service",
         "update_service", "delete_service"),
        ("statefulset", "create_statefulset", "get_statefulset",
         "update_statefulset", "delete_statefulset"),
    ])
    def test_full_lifecycle_per_resource(client, resource):
        """Every resource type supports create → get → update → delete → 404."""
        kind, create_fn, get_fn, update_fn, delete_fn = resource
        body = build_test_resource(kind, name="test-res")
        # Create
        created = getattr(client, create_fn)("default", body)
        assert "resourceVersion" in created["metadata"]
        # Get
        fetched = getattr(client, get_fn)("default", "test-res")
        assert fetched["metadata"]["name"] == "test-res"
        # Update
        fetched["metadata"]["annotations"] = {"updated": "true"}
        updated = getattr(client, update_fn)("default", "test-res", fetched)
        assert updated["metadata"]["annotations"]["updated"] == "true"
        # Delete
        getattr(client, delete_fn)("default", "test-res")
        # Get after delete → 404
        with pytest.raises((FakeNotFoundError, ApiException)):
            getattr(client, get_fn)("default", "test-res")

    TOTAL CONTRACT TESTS: ~25-30 test functions × 2 backends (fake + real).
    If ALL pass on both: the fake is honest. If ANY fails on one but passes
    on the other: the fake has a bug.

    CONTRACT TESTS WE EXPLICITLY DO NOT WRITE:
    ───────────────────────────────────────────
    - Pod scheduling (fake doesn't model pods)
    - Rolling update mechanics (fake stores manifests, doesn't execute them)
    - PVC binding to PV (fake stores PVC, doesn't provision storage)
    - Watch/stream semantics (fake doesn't model watch)
    - Delete cascade (fake doesn't model ownerReferences propagation)
    - Apply semantics (fake uses create/update, not server-side apply)

    These behaviors are tested in integration tests (k3d/kind) only.


FAKEK8SCLIENT — BEHAVIORAL BOUNDARIES
──────────────────────────────────────
The FakeK8sClient is a SIMULATED K8s API server. It must be precise about
what it models and what it doesn't. If a developer writes a test that
relies on a behavior the fake doesn't model, that test is meaningless.

    WHAT THE FAKE MODELS (contract-tested):
    ────────────────────────────────────────
    Behavior                          How                           Fidelity
    ────────────────────────────────  ────────────────────────────  ────────
    CRUD on all resource types        Dict[tuple[ns,name], dict]   Exact
    resourceVersion on create/update  Auto-incrementing counter     Exact
    409 Conflict on stale update      Compare resourceVersion       Exact
    409 AlreadyExists on duplicate    Check key exists before       Exact
                                      create
    404 NotFound on missing resource  Check key exists before       Exact
                                      get/update/delete
    Label selector filtering (AND)    Dict key/value match on       Exact
                                      metadata.labels
    Namespace scoping                 (ns, name) tuple key          Exact
    call_log for test assertions      Appends (method, args, kw)    N/A

    WHAT THE FAKE DOES NOT MODEL (integration-only):
    ─────────────────────────────────────────────────
    Behavior                    Why Not                         Where Tested
    ──────────────────────────  ──────────────────────────────  ────────────
    Pod scheduling              No scheduler, no kubelet        k3d/kind
    Container runtime           No Docker/containerd            k3d/kind
    Rolling update mechanics    Stores Deployment, doesn't      k3d/kind
                                roll pods
    Readiness/liveness probes   No health checking              k3d/kind
    RBAC enforcement            No policy engine                real (k3d/kind)
    Resource quotas             No admission controller         real (k3d/kind)
    PVC → PV binding            No storage provisioner          k3d/kind
    LoadBalancer IP allocation  No cloud controller             k3d/real
    Ingress controller behavior No Ingress controller process   k3d/kind
    cert-manager TLS            No cert-manager process         real
    Watch / stream              No event system                 k3d/kind
    Delete cascade              No ownerReference propagation   k3d/kind
    Server-side apply           Uses create/update only         k3d/kind
    Finalizers                  No finalizer processing         k3d/kind
    Admission webhooks          No webhook infra                k3d/kind
    Eventual consistency        Operations are instant          k3d/kind
    Field validation            Stores any dict, no schema      real (k3d/kind)

    THE INSTANT-OPERATION MODEL:
    ────────────────────────────
    In real K8s: create Deployment → wait → pods appear → wait → Ready.
    In fake K8s: create Deployment → done. No pods ever appear.

    This is CORRECT for unit tests. SDK methods that create Deployments
    only need to verify: (a) the manifest was created, (b) labels are
    correct, (c) the image is set. They do NOT need to verify pods start.

    Tests that need real pod behavior use @pytest.mark.needs("pods")
    and run on k3d/kind only. The fake is not broken — it's scoped.

    THE DELETE-CASCADE GAP:
    ───────────────────────
    In real K8s: delete Deployment → ReplicaSet deleted → Pods deleted.
    In fake K8s: delete Deployment → Deployment removed from dict. That's it.

    This means: apps:destroy tests on fake verify that the SDK calls
    delete on EVERY resource type. On k3d, the same test also verifies
    that K8s cascades work. Both are valid — they test different things.

    THE LABEL-MATCHING MODEL:
    ─────────────────────────
    The fake supports equality-based label selectors only:
        labels={"app": "myapi", "managed-by": "kuberoku"}
    This is sufficient because Kuberoku ONLY uses equality-based selectors.
    Set-based selectors (in, notin, exists) are NOT used and NOT modeled.
    If we ever need set-based selectors, add them to the fake AND add
    contract tests.


WHAT TO TEST FOR EVERY NEW COMMAND — THE CHECKLIST
───────────────────────────────────────────────────
When adding a new command (e.g., `kuberoku foo:bar`), write these tests:

    SDK + CLI TESTS (tests/{feature}/test_{command}.py):
    ┌──────────────────────────────────────────────────────────────────┐
    │  1. Happy path          Does the right K8s resources get        │
    │                         created/updated/deleted?                │
    │  2. Idempotency         Running twice produces the same result  │
    │                         (no duplicates, no errors)              │
    │  3. Not found           App doesn't exist → AppNotFoundError    │
    │  4. Already exists      Resource already exists → correct error │
    │  5. Conflict retry      Stale resourceVersion → retry succeeds  │
    │  6. Invalid input       Bad app name, missing required args     │
    │  7. Labels correct      All K8s resources have correct labels   │
    │  8. Resource names      Names follow {PREFIX}-{type}-{app}      │
    │                         convention and handle 63-char limit     │
    └──────────────────────────────────────────────────────────────────┘

    CLI TESTS (same file, TestFooCLI class):
    ┌──────────────────────────────────────────────────────────────────┐
    │  9. CLI parsing         Both `foo:bar` and `foo bar` work       │
    │  10. Exit codes         0 on success, 1 on user error,          │
    │                         2 on infra error                        │
    │  11. Output format      Output matches expected text/format     │
    │  12. Error message      Error follows format conventions        │
    │                         (what/why/fix)                          │
    │  13. --app resolution   Works with --app, env var, .kuberoku    │
    │  14. --help             Help text is present and accurate       │
    └──────────────────────────────────────────────────────────────────┘

    INTEGRATION TESTS (tests/integration/ — only if K8s behavior matters):
    ┌──────────────────────────────────────────────────────────────────┐
    │  15. Real K8s           Same happy path against k3d             │
    │  16. Real error paths   Permission denied, quota exceeded       │
    └──────────────────────────────────────────────────────────────────┘


END-TO-END USER FLOWS (tests/flows/test_*_flow.py)
────────────────────────────────────────────────
These are full user journeys. Each flow exercises multiple commands in
sequence, simulating a real user session. They run against FakeK8sClient
(fast) AND against k3d (integration).

    FLOW 1: BASIC WEB APP (the "hello world" flow)
    ────────────────────────────────────────────────
    apps:create myapi
    apps:link:add myapi
    config:set DATABASE_URL=postgres://...
    deploy --image myapi:v1
    releases                                    # v1 exists
    deploy --image myapi:v2
    releases                                    # v1, v2 exist
    releases:rollback v1
    releases                                    # v1, v2, v3 (rollback)
    apps:destroy myapi --confirm myapi
    apps                                        # myapi gone

    Asserts:
    - Release history is append-only (rollback = new release)
    - Config vars survive across deploys
    - Destroy cleans up ALL resources (no orphans)

    FLOW 2: NON-HTTP MULTI-PORT SERVICE (the SMTP flow)
    ────────────────────────────────────────────────────
    apps:create smtp-service
    deploy --image postfix:v1 --type smtp \
        --port 25/tcp --port 465/tcp --port 587/tcp
    services:expose:on smtp
    services:ports:add 2525/tcp --type smtp              # add port, rolling restart
    services:ports                                       # 4 ports listed
    services:ports:remove 2525/tcp --type smtp
    services:expose:off smtp
    apps:destroy smtp-service --confirm smtp-service

    Asserts:
    - Service created with correct ports
    - services:expose:on changes Service type to LoadBalancer
    - services:ports:add updates both Service and Deployment
    - services:expose:off reverts to ClusterIP
    - All resources cleaned up

    FLOW 3: MULTI-ADDON APP (the "3 postgres + 2 redis" flow)
    ──────────────────────────────────────────────────────────
    apps:create myapp
    addons:create postgres                      # instance: postgres
    addons:create postgres --as analytics       # instance: analytics
    addons:create postgres --as archival        # instance: archival
    addons:create redis                         # instance: redis
    addons:create redis --as sessions           # instance: sessions
    addons                                      # 5 addons listed
    config                                      # DATABASE_URL, ANALYTICS_URL,
                                                # ARCHIVAL_URL, REDIS_URL,
                                                # SESSIONS_URL all present
    addons:connect postgres                     # port-forward to postgres:5432
    addons:connect analytics                    # port-forward to analytics:5432
                                                # (different local port!)
    addons:expose:on analytics                     # LoadBalancer
    addons:expose:off analytics                   # back to ClusterIP
    addons:destroy archival --confirm archival
    addons                                      # 4 addons (archival gone)
    config                                      # ARCHIVAL_URL removed
    apps:destroy myapp --confirm myapp

    Asserts:
    - Each addon gets unique ClusterIP (no port conflicts)
    - Env var naming: first postgres = DATABASE_URL, named = {NAME}_URL
    - addons:connect auto-avoids local port conflicts
    - Destroy addon removes env var from app config
    - Destroy app cleans up ALL addon resources

    FLOW 4: MIXED APP — HTTP + TCP + ADDONS (the "real platform" flow)
    ──────────────────────────────────────────────────────────────────
    apps:create platform
    addons:create postgres
    config:set SECRET_KEY=abc123
    deploy --image platform:v1                  # Procfile: web + grpc
    ps                                          # web.1, grpc.1
    domains:add api.platform.com --type web
    services:expose:on grpc                        # LoadBalancer for gRPC
    ps:scale web=3 grpc=2
    ps                                          # web.1-3, grpc.1-2
    logs --tail --type web
    run bash                                    # exec into web.1
    run --detach rake db:migrate                # Job
    apps:maintenance:on                           # scale all to 0
    apps:maintenance:off                          # restore replicas
    deploy --image platform:v2
    releases                                    # v1, v2
    apps:status                                 # full overview
    apps:destroy platform --confirm platform

    Asserts:
    - Multiple process types from single deploy
    - Domains (Ingress) and services:expose:on (LoadBalancer) on different processes
    - Scale, logs, run all work
    - Maintenance mode toggles (app-wide + per-service)
    - Status shows complete picture
    - Full cleanup

    FLOW 5: DEPLOY FAILURE + RECOVERY
    ──────────────────────────────────
    apps:create myapi
    deploy --image myapi:v1                     # success
    deploy --image myapi:v2-bad                 # simulate rollout timeout
    releases                                    # v1, v2 (v2 = failed)
    ps                                          # pods in CrashLoopBackOff
    releases:rollback v1                        # v3 = rollback to v1
    ps                                          # pods healthy again
    apps:destroy myapi --confirm myapi

    Asserts:
    - Failed deploy still creates a release (intent recorded)
    - Rollback to known-good version works
    - Re-running same deploy after failure is idempotent

    FLOW 6: CONFIG CHANGE + RESTART
    ────────────────────────────────
    apps:create myapi
    deploy --image myapi:v1
    config:set KEY1=val1 KEY2=val2              # bulk set
    releases                                    # config change = new release
    config:get KEY1                             # val1
    config:unset KEY2
    config                                      # KEY1 only
    config:set SECRET=s3cret --secret           # stored in K8s Secret
    apps:destroy myapi --confirm myapi

    Asserts:
    - Config changes create releases
    - Bulk set is atomic (one release, not two)
    - --secret stores in Secret, not ConfigMap
    - Unset removes the key


K8S VERSION COMPATIBILITY
─────────────────────────
Kuberoku targets the 3 most recent K8s minor versions (the same window
that K8s itself supports). Currently that's 1.33, 1.34, 1.35.

    SUPPORTED K8S VERSIONS:
    ───────────────────────
    K8s Version    Status        API Differences That Matter
    ──────────    ───────────    ──────────────────────────────────────
    1.33          Supported      Baseline. All APIs stable.
    1.34          Supported      No breaking changes for our API surface.
    1.35          Supported      Latest. No breaking changes for our API surface.
    1.32-         Best-effort    Not tested in CI, may work.
    1.36+         Add on release When K8s releases a new minor, add to matrix.

    APIs we use and their stability:
    ────────────────────────────────
    API Group              Resource         API Version    Stable Since
    ──────────────────     ────────────     ───────────    ────────────
    core/v1                ConfigMap        v1             always
    core/v1                Secret           v1             always
    core/v1                Service          v1             always
    core/v1                Pod              v1             always
    core/v1                PVC              v1             always
    core/v1                Namespace        v1             always
    apps/v1                Deployment       v1             1.9
    apps/v1                StatefulSet      v1             1.9
    batch/v1               Job              v1             1.21
    networking.k8s.io/v1   Ingress          v1             1.19
    autoscaling/v2         HPA              v2             1.23

    All APIs we use have been stable (GA) for years. No beta APIs.
    This means K8s version upgrades should NOT break Kuberoku.
    We test anyway because behavior can change even when APIs don't.

    CI VERSION MATRIX:
    ──────────────────
    The tests.yml workflow runs against multiple k3d versions:

        strategy:
          matrix:
            k8s_version: ["1.33", "1.34", "1.35"]
        steps:
          - run: k3d cluster create test --image rancher/k3s:v${{ matrix.k8s_version }}

    Every integration and contract test runs against all supported versions.
    If a test fails on one version but passes on others, that's a
    compatibility bug — investigate and fix, don't skip.

    WHEN K8S RELEASES A NEW VERSION:
    ────────────────────────────────
    1. Add new version to CI matrix
    2. Drop oldest version (keep 3)
    3. Run full test suite
    4. Check K8s changelog for deprecated APIs we use (unlikely — all GA)
    5. Update this section


BACKWARD COMPATIBILITY — OLD RESOURCES STILL WORK
──────────────────────────────────────────────────
When Kuberoku is upgraded, resources created by older versions must still
work. This is critical for trust — users should never fear upgrading.

    WHAT CAN CHANGE BETWEEN KUBEROKU VERSIONS:
    ────────────────────────────────────────────
    - New labels/annotations added to resources (additive, safe)
    - New fields in release ConfigMap JSON (additive, safe)
    - New process type behaviors (additive, safe)

    WHAT MUST NEVER CHANGE (BREAKING):
    ───────────────────────────────────
    - Resource naming convention ({PREFIX}-{type}-{app}-{process})
    - Label key names (app.kuberoku.com/app, etc.)
    - ConfigMap data key names ("config", "formation", etc.)
    - Release ConfigMap JSON schema (can add keys, cannot remove/rename)
    - Env var injection format (DATABASE_URL, {NAME}_URL)

    MIGRATION TESTS:
    ────────────────
    For each Kuberoku minor release, we keep a "fixture snapshot" — a set
    of K8s resource manifests (JSON) representing resources created by that
    version. Backward compatibility tests load these fixtures into the
    FakeK8sClient and verify that the current version can:

    1. List apps (reads old-format app ConfigMaps)
    2. Deploy (updates old-format Deployments)
    3. Read config (parses old-format env ConfigMaps)
    4. Read releases (parses old-format release ConfigMaps)
    5. Manage addons (finds old-format StatefulSets)

    tests/fixtures/
    ├── v0.1.0/                  # resources created by v0.1.0
    │   ├── app_configmap.json
    │   ├── env_configmap.json
    │   ├── formation_configmap.json
    │   ├── release_configmap.json
    │   └── deployment.json
    ├── v0.2.0/                  # resources created by v0.2.0 (if format changed)
    │   └── ...
    └── ...

    def test_current_reads_v010_app(fake_client):
        """Current Kuberoku can read apps created by v0.1.0."""
        cm = load_fixture("v0.1.0/app_configmap.json")
        fake_client.configmaps[("kuberoku", cm["metadata"]["name"])] = cm
        apps = AppsService(fake_client)
        app = apps.get("myapi")
        assert app.name == "myapi"

    If a format change is needed (rare), Kuberoku must:
    1. Read both old and new format (detect by presence of a key)
    2. Write new format only
    3. Document the migration in release notes
    4. NEVER require users to run a migration command


TEST ISOLATION + CLEANUP
────────────────────────
    UNIT TESTS: Each test gets a fresh FakeK8sClient via fixture.
    No shared state between tests. No cleanup needed.

    INTEGRATION TESTS: Each test creates resources in a unique namespace
    (test-{uuid4[:8]}) and deletes the namespace in teardown. This
    prevents test pollution even if a test crashes mid-run.

        @pytest.fixture
        def test_namespace(real_client):
            ns = f"test-{uuid4().hex[:8]}"
            real_client.create_namespace(ns)
            yield ns
            real_client.delete_namespace(ns)  # cleanup

    E2E TESTS: CliRunner tests use FakeK8sClient (injected via
    dependency override). No real cluster needed. Same fresh-per-test
    isolation as unit tests.


REAL K8S TESTING RULES — HARD-WON, NON-NEGOTIABLE
──────────────────────────────────────────────────
These rules prevent flaky CI failures on real K8s (--backend real).
Every new test touching real K8s MUST follow these.

    1. NEVER use raw k8s.update_deployment/service/statefulset() in tests
       that run on real K8s. Deployments have controllers that race to
       mutate resourceVersion → 409 Conflict. Use SDK methods (ps.scale,
       deploy.deploy) which have built-in retry logic.

    2. NEVER assert immediately after delete. K8s deletion is async.
       Use poll_until(fn=_get_or_none, predicate=lambda r: r is None).

    3. NEVER use time.sleep() for stabilization. Use poll_until() or
       wait_for_port_open(). Sleeps waste CI time and still flake.

    4. NEVER create Deployments/Services with raw k8s.create_*() in tests
       that run on real K8s. Use factory.deploy.deploy() which sets
       correct labels for cleanup.

    5. Contract test updates MUST use read-modify-write + retry loop:
           for _attempt in range(5):
               current = k8s.get_deployment(ns, name)
               current["spec"]["replicas"] = 3
               try:
                   k8s.update_deployment(ns, name, current)
                   break
               except FakeConflictError:
                   continue

    6. Namespace tests MUST clean up — use request.addfinalizer() and
       k8ut- prefix so session cleanup catches strays.

    7. _wipe_namespace_resources() MUST wait for ALL resource types
       (Deployment, StatefulSet, Service, Ingress, NetworkPolicy, PVC,
       ConfigMap, Secret) — not just a subset.

    8. Know which fixtures route to real K8s:
       - factory, k8s_client → real with --backend real
       - app_factory, deployed_factory, fake_k8s → always FakeK8sClient

    Shared E2E helpers (tests/e2e/helpers.py):
    poll_until, wait_for_pods_ready, wait_for_port_open,
    wait_for_sts_ready, wait_for_service_deleted,
    wait_for_lb_address, wait_for_no_pods, find_running_pod


UNIT TEST EXAMPLES
──────────────────
    def test_apps_create(fake_client):
        apps = AppsService(fake_client)
        app = apps.create("myapi")

        assert app.name == "myapi"
        assert ("default", "kuberoku-app-myapi") in fake_client.configmaps
        assert ("default", "kuberoku-env-myapi") in fake_client.configmaps

    def test_apps_create_duplicate(fake_client):
        apps = AppsService(fake_client)
        apps.create("myapi")

        with pytest.raises(AppAlreadyExistsError):
            apps.create("myapi")

    def test_expose_changes_service_type(fake_client):
        # Setup: create app with deployed service
        setup_deployed_app(fake_client, "myapi", process="smtp",
                          ports=["25/tcp", "587/tcp"])

        services = ServicesService(fake_client)
        result = services.expose_on("myapi", "smtp")

        svc = fake_client.services[("default", "kuberoku-myapi-smtp")]
        assert svc["spec"]["type"] == "LoadBalancer"
        assert result.method == "loadbalancer"

    def test_expose_addon(fake_client):
        setup_deployed_app(fake_client, "myapi")
        setup_addon(fake_client, "myapi", "postgres")

        addons = AddonsService(fake_client)
        addons.expose_on("myapi", "postgres")

        svc = fake_client.services[("default", "kuberoku-addon-myapi-postgres")]
        assert svc["spec"]["type"] == "LoadBalancer"

    def test_multi_instance_addon_env_vars(fake_client):
        addons = AddonsService(fake_client)
        addons.create("myapi", "postgres")
        addons.create("myapi", "postgres", as_name="analytics")

        env = fake_client.secrets[("default", "kuberoku-env-myapi")]
        assert "DATABASE_URL" in env["data"]
        assert "ANALYTICS_URL" in env["data"]

E2E TEST EXAMPLES
─────────────────
    from click.testing import CliRunner
    from kuberoku.cli.main import app

    def test_apps_create_e2e(runner):
        result = runner.invoke(app, ["apps:create", "mytest"])
        assert result.exit_code == 0
        assert "Created mytest" in result.output

        result = runner.invoke(app, ["apps"])
        assert "mytest" in result.output

    def test_colon_and_space_both_work(runner):
        r1 = runner.invoke(app, ["apps:create", "test1"])
        r2 = runner.invoke(app, ["apps", "create", "test2"])
        assert r1.exit_code == 0
        assert r2.exit_code == 0

    def test_error_message_format(runner):
        result = runner.invoke(app, ["config:set", "KEY=val"])
        # No app linked, no --app flag
        assert result.exit_code == 1
        assert "Error:" in result.output
        assert "--app" in result.output  # suggests fix


COVERAGE
────────
    - High coverage enforced in CI (see pyproject.toml for threshold)
    - EVERY PHASE must ship at the enforced threshold — coverage debt is not deferred
    - Coverage on error paths is mandatory — every except clause must be tested
    - EXCEPTION COVERAGE: Every exception class must be triggered by at least
      one test (see EXCEPTION COVERAGE MANDATE section below for enforcement)

BACKEND CAPABILITY MATRIX — THE CORE TESTING ARCHITECTURE
──────────────────────────────────────────────────────────
Tests do NOT hardcode which backend they run on. Instead:

    1. Tests DECLARE what capabilities they need (via pytest markers)
    2. Backends ADVERTISE what capabilities they provide
    3. --backend flag selects fake or real per pytest run
    4. pytest auto-skips tests whose needs exceed the backend capabilities
    5. Same tests run on both backends — when a test passes on fake AND
       real, you know the business logic AND the K8s integration work

This means: adding a new cluster type (EKS, GKE, AKS) requires zero
test changes — just point KUBECONFIG and run with --backend real.

    CAPABILITY MATRIX:
    ──────────────────
    Capability         Fake    Real K8s (k3d/kind/Colima/EKS/etc.)
    ─────────────────  ──────  ─────────────────────────────────────
    api_crud           ✓       ✓
    label_selectors    ✓       ✓
    resource_version   ✓       ✓
    network_policy     ✓       ✓ (CRUD only — enforcement needs Calico/Cilium CNI)
    rbac               ✗       ✓
    pods               ✗       ✓
    exec               ✗       ✓
    logs               ✗       ✓
    port_forward       ✗       ✓
    load_balancer      ✗       ✓ (depends on cluster type)
    ingress            ✗       ✓
    pvc_binding        ✗       ✓
    registry           ✗       ✓ (k3d/kind with local registry)
    cloud_lb           ✗       ✓ (cloud clusters only)

    TEST MARKERS — HOW TESTS DECLARE NEEDS:
    ────────────────────────────────────────
    # Test needs only API CRUD (runs on fake and real)
    def test_apps_create(factory):
        ...

    # Test needs running pods (runs on real only — skipped on fake)
    @pytest.mark.needs("pods")
    def test_logs_streaming(factory):
        ...

    # Test needs exec + pods (runs on real only)
    @pytest.mark.needs("exec")
    def test_run_bash(factory):
        ...

    # Test needs port-forward (runs on real only)
    @pytest.mark.needs("port_forward")
    def test_connect_portforward(factory):
        ...

    Tests with NO marker need only api_crud — they run on both backends.

    THE factory FIXTURE — BACKEND-AWARE:
    ─────────────────────────────────────
    # conftest.py

    @pytest.fixture()
    def factory(request, tmp_namespace):
        """Factory wired to K8s backend based on --backend flag."""
        backend = request.config.getoption("--backend")
        if backend == "real":
            return KuberokuFactory(
                k8s_client=K8sClient.from_context(),
                namespace=tmp_namespace,
            )
        return KuberokuFactory(
            k8s_client=FakeK8sClient(),
            namespace=tmp_namespace,
        )

    HOW THIS PLAYS OUT:
    ───────────────────

    # --backend fake (default, fast)
    #   test_apps_create         → runs on fake           ✓
    #   test_logs_streaming      → skipped (fake lacks pods)
    #   test_connect_portforward → skipped (fake lacks port_forward)

    # --backend real (k3d in CI, Colima locally, EKS for pre-prod)
    #   test_apps_create         → runs on real K8s       ✓
    #   test_logs_streaming      → runs on real K8s       ✓
    #   test_connect_portforward → runs on real K8s       ✓

    THE POWER: When test_apps_create passes on both fake AND real, you
    know the business logic works AND the K8s integration works. If it
    passes on fake but fails on real → either FakeK8sClient has a bug
    or the K8s resource body is invalid. Contract tests catch this.


THE TWO BACKENDS
────────────────

    1. FAKE (FakeK8sClient — in-memory, instant)
    ──────────────────────────────────────────────
    Startup:      0s (in-process)
    Memory:       ~0
    Docker:       No
    Fidelity:     Simulated — models API behavior, not real K8s

    The FakeK8sClient (see above) stores resources in dicts. It models
    resourceVersion, 404/409 errors, label selectors, and namespace
    scoping. It does NOT model RBAC, pod scheduling, or anything that
    requires a real API server.

    Used by: all tests (default backend)
    Select:  pytest tests/ --backend fake   (or omit — fake is default)

    2. REAL (any K8s cluster — k3d, kind, Colima, EKS, GKE, AKS)
    ──────────────────────────────────────────────────────────────
    Startup:      depends on cluster provider (CI provisions it)
    Docker:       depends on cluster provider
    Fidelity:     100% — this is real K8s

    The "real" backend connects to whatever K8s cluster is in KUBECONFIG.
    It works with any cluster type. CI provisions the cluster — code
    doesn't care which provider.

    Select:  pytest tests/ --backend real

    RealK8sClient (tests/backends/real.py) wraps K8sClient and normalizes
    ApiException (404→FakeNotFoundError, 409→FakeConflictError) so
    contract tests use the same exception types on both backends.

    CI CLUSTER PROVIDERS (not backends — just how CI gets a cluster):

    K3D (k3s-in-Docker — fast, every PR):
        k3d cluster create kuberoku-test \
            --image rancher/k3s:v1.35.0-k3s1 \
            --wait --timeout 60s
        ~11s startup, ~50MB RAM, k3s (99% K8s compatible)

    KIND (upstream K8s-in-Docker — weekly compat):
        kind create cluster --name kuberoku-compat \
            --image kindest/node:v1.35.0
        ~90s startup, ~300MB RAM, full upstream K8s (identical to EKS/GKE/AKS)

    Why both: k3d is fast for every-PR testing. kind proves we work on
    real upstream K8s (not just k3s). k3s has minor differences from
    upstream (CRD validation strictness, API priority/fairness).

    COLIMA / DOCKER DESKTOP (developer laptop):
        colima start --kubernetes
        Already running — zero startup. Developers test locally.


PRESETS — COMMON CONFIGURATIONS:
────────────────────────────────
    # Developer laptop — fast feedback loop
    ./scripts/test.sh --backend fake             # <5s

    # Developer laptop — full validation (needs Colima/Docker Desktop)
    ./scripts/test.sh                            # fake + real (~60s)

    # CI — every PR (tests.yml)
    pytest tests/ --backend fake                 # <5s
    pytest tests/ --backend real                 # k3d, ~55s

    # CI — weekly compat (compat.yml)
    pytest tests/ --backend real                 # kind × K8s versions

    # CI — release gate (release.yml)
    # All of the above must pass. Coverage enforced.


CLUSTER LIFECYCLE — EPHEMERAL AND ISOLATED
──────────────────────────────────────────
Every cluster backend (k3d, kind) is spun up before the test suite and
torn down after. No shared clusters. No leftover resources.

    WHY EPHEMERAL:
    - Tests must not depend on pre-existing cluster state
    - Parallel CI runs cannot interfere with each other
    - Developers can run integration tests locally with one command
    - No "works on my cluster but fails in CI" problems

    NAMESPACE ISOLATION:
    Each test creates resources in a unique namespace (test-{uuid4[:8]})
    and deletes the namespace in teardown. Even within a shared cluster,
    tests never see each other's resources.

        @pytest.fixture
        def test_namespace(k8s):
            ns = f"test-{uuid4().hex[:8]}"
            k8s.create_namespace(ns)
            yield ns
            k8s.delete_namespace(ns)  # always cleanup

    ADDON TESTS:
    Addon tests (postgres, redis) need real pods → requires k3d/kind.
    The test creates a StatefulSet, waits for Ready, runs operations,
    then cleans up. PVCs use the default StorageClass
    (k3d: local-path-provisioner, kind: standard).

    BUILD-FROM-GIT TESTS:
    Use a local Docker registry (k3d creates one with --registry-create).
    Tests push to localhost:5555 and deploy from there. No cloud
    credentials needed.

    NETWORKING TESTS:
    k3d supports LoadBalancer via its built-in Serverlb. Expose tests
    verify that Service.spec.type changes to LoadBalancer and an external
    IP is eventually assigned. Port-forward tests create a TCP echo server
    pod, port-forward to it, send data, and verify the response.


LOCAL DEVELOPMENT — ONE COMMAND:
────────────────────────────────
    ./scripts/test.sh                     # fake + real (~60s)
    ./scripts/test.sh --backend fake      # fake only (<5s)
    ./scripts/test.sh --backend real      # real only (~55s, needs cluster)

    The test script handles backend detection:
    - Default: runs fake (all tests) + real (all tests, skipped if no cluster)
    - --backend fake: runs all tests on fake only
    - --backend real: runs all tests on real only (needs running cluster)
    - Gracefully skips real if no K8s cluster available

    Typical dev loop: `./scripts/test.sh --backend fake` on every save,
    `./scripts/test.sh` before pushing (runs both backends).


K8S VERSION MATRIX:
───────────────────
The version matrix applies to k3d and kind cluster images:

    k3d:     rancher/k3s images for v1.33, v1.34, v1.35
    kind:    kindest/node images for v1.33, v1.34, v1.35

In CI, k3d tests matrix over K8s versions on every PR. Kind tests
matrix over K8s versions weekly.


CI PIPELINE — THREE WORKFLOWS:
──────────────────────────────
    tests.yml (every push, every PR):
        jobs:
        1. lint — ruff check, ruff format --check, mypy --strict
        2. fake — pytest --backend fake (Python 3.11/3.12/3.13 matrix)
        3. integration — k3d + pytest --backend real (K8s v1.33/v1.34/v1.35)
        speed: ~2min (lint + fake parallel with integration)
        gate: MUST pass to merge

    compat.yml (weekly + manual trigger):
        jobs:
        1. compat — kind + pytest --backend real (K8s v1.33/v1.34/v1.35)
        speed: ~5min (kind is slower than k3d)
        purpose: Proves we work on upstream K8s, not just k3s

    release.yml (git tag v*):
        jobs:
        1. gate — lint + type check + fake tests + coverage enforced
        2. integration — k3d + real tests
        3. compat — kind × K8s versions (v1.33/v1.34/v1.35)
        gate: ALL jobs must pass before publish
        future: + PyInstaller binaries + PyPI publish + GitHub Release

    PR MERGE: tests.yml MUST pass.
    RELEASE: release.yml MUST pass (includes lint, fake, k3d, kind). Coverage enforced.


WHAT WE DO NOT TEST (AND WHY)
──────────────────────────────
    - Cloud provider LB IP allocation (needs real cloud — use "real" backend
      in pre-production validation only. Normal CI tests that Service.spec.type
      is set correctly, not that an IP is actually provisioned.)
    - cert-manager certificate issuance (external controller — test Ingress
      creation only. Installing cert-manager in k3d is possible but slow.)
    - Docker build/push to cloud registries (tested with local k3d registry.
      Real ECR/GCR push requires cloud credentials in CI.)
    - Multi-node scheduling (k3d/kind run single-node by default. Pod
      scheduling across nodes is K8s's job, not ours.)
    - Performance under load (not a goal for v1 — Kuberoku is a CLI.)
    - Container startup latency (non-deterministic — tests poll for readiness
      with generous timeouts)


BUILD-FROM-GIT TEST STRATEGY
─────────────────────────────
Deploy from git (Phase 4) invokes external tools: git archive, docker build,
docker push. Testing this without Docker requires careful layering.

    THE THREE LAYERS OF BUILD-FROM-GIT:
    ────────────────────────────────────
    Layer 1: Git operations     git archive HEAD → tar stream
    Layer 2: Docker operations  docker build -t {tag} - < tar | docker push {tag}
    Layer 3: Orchestration      Git → Docker → Deploy (calls layer 1 + 2 + SDK)

    Each layer is tested independently:

    LAYER 1 — GIT (tests/deploy/test_git_archive.py):
    ──────────────────────────────────────────────────
    Uses a real temporary git repo (no mocking):

        @pytest.fixture
        def git_repo(tmp_path):
            """Create a real git repo with a Dockerfile."""
            subprocess.run(["git", "init", str(tmp_path)], check=True)
            (tmp_path / "Dockerfile").write_text("FROM alpine\n")
            (tmp_path / "app.py").write_text("print('hello')\n")
            subprocess.run(["git", "add", "."], cwd=tmp_path, check=True)
            subprocess.run(["git", "commit", "-m", "init"], cwd=tmp_path, check=True)
            return tmp_path

        def test_git_archive_produces_tar(git_repo):
            tar_bytes = git_archive(git_repo, ref="HEAD")
            assert len(tar_bytes) > 0
            # Verify tar contains Dockerfile
            import tarfile, io
            with tarfile.open(fileobj=io.BytesIO(tar_bytes)) as tf:
                names = tf.getnames()
                assert "Dockerfile" in names

        def test_git_archive_specific_ref(git_repo):
            # Create a tag, modify, then archive the tag
            subprocess.run(["git", "tag", "v1.0"], cwd=git_repo, check=True)
            (git_repo / "app.py").write_text("print('v2')\n")
            subprocess.run(["git", "add", "."], cwd=git_repo, check=True)
            subprocess.run(["git", "commit", "-m", "v2"], cwd=git_repo, check=True)
            tar_bytes = git_archive(git_repo, ref="v1.0")
            # Should contain v1 content
            ...

        def test_git_archive_dirty_tree_uses_committed(git_repo):
            (git_repo / "dirty.txt").write_text("uncommitted\n")
            tar_bytes = git_archive(git_repo, ref="HEAD")
            import tarfile, io
            with tarfile.open(fileobj=io.BytesIO(tar_bytes)) as tf:
                assert "dirty.txt" not in tf.getnames()

    LAYER 2 — DOCKER (tests/deploy/test_docker_build.py):
    ──────────────────────────────────────────────────────
    Mocks subprocess calls. Never invokes real Docker:

        def test_docker_build_calls_correct_command(mocker):
            mock_run = mocker.patch("subprocess.run")
            mock_run.return_value = CompletedProcess(args=[], returncode=0)

            docker_build(tar_bytes=b"...", tag="myapp:abc123")

            mock_run.assert_called_once()
            args = mock_run.call_args[0][0]
            assert args[:3] == ["docker", "build", "-t"]
            assert "myapp:abc123" in args

        def test_docker_push_calls_correct_command(mocker):
            mock_run = mocker.patch("subprocess.run")
            mock_run.return_value = CompletedProcess(args=[], returncode=0)

            docker_push("myapp:abc123")

            mock_run.assert_called_once()
            args = mock_run.call_args[0][0]
            assert args == ["docker", "push", "myapp:abc123"]

        def test_docker_build_failure_raises(mocker):
            mock_run = mocker.patch("subprocess.run")
            mock_run.return_value = CompletedProcess(args=[], returncode=1,
                                                      stderr="build failed")

            with pytest.raises(BuildError, match="build failed"):
                docker_build(tar_bytes=b"...", tag="myapp:abc123")

    LAYER 3 — ORCHESTRATION (tests/deploy/test_git.py):
    ────────────────────────────────────────────────────
    Mocks layers 1 + 2, tests the orchestration logic:

        def test_deploy_from_git_full_flow(factory, mocker):
            factory.apps.create("myapi")
            mocker.patch("kuberoku.sdk.deploy.git_archive", return_value=b"tar...")
            mocker.patch("kuberoku.sdk.deploy.docker_build")
            mocker.patch("kuberoku.sdk.deploy.docker_push")

            release = factory.deploy.deploy("myapi", from_git="/path/to/repo")

            assert release.version == 1
            assert release.commit is not None  # SHA extracted from git

        def test_deploy_from_git_reads_procfile(factory, mocker):
            """If Procfile exists in git archive, auto-detect process types."""
            tar_with_procfile = build_tar_with_procfile(
                "web: gunicorn app:app\nworker: celery -A tasks\n"
            )
            mocker.patch("kuberoku.sdk.deploy.git_archive",
                         return_value=tar_with_procfile)
            mocker.patch("kuberoku.sdk.deploy.docker_build")
            mocker.patch("kuberoku.sdk.deploy.docker_push")

            factory.apps.create("myapi")
            release = factory.deploy.deploy("myapi", from_git="/path/to/repo")

            # Formation should have both web and worker
            assert "web" in release.formation
            assert "worker" in release.formation

    INTEGRATION (k3d with local registry):
    ───────────────────────────────────────
    @pytest.mark.needs("registry")
    def test_deploy_from_git_real_build(factory, git_repo):
        """Real git archive → real docker build → push to k3d registry."""
        factory.apps.create("myapi")
        release = factory.deploy.deploy("myapi", from_git=str(git_repo))
        assert release.version == 1
        # Verify pod is actually running the built image
        pods = factory.k8s.list_pods(factory.namespace,
            labels={"app.kuberoku.com/app": "myapi"})
        assert len(pods) > 0

    KEY INSIGHT: Unit tests (layer 3) run in <1s with no Docker.
    Integration tests (k3d) run only when registry is available.
    The orchestration logic is fully testable without Docker.


FIXTURE SCOPING — FUNCTION-SCOPED ISOLATION
─────────────────────────────────────────────

    Backend      Fixture Scope    Why
    ──────────   ──────────────   ──────────────────────────────────────
    fake         function         Instant. Fresh FakeK8sClient per test.
    real         function         Cluster already exists (CI or local).
                                  Fresh namespace per test for isolation.

    IMPLEMENTATION:
    ──────────────

    # conftest.py

    @pytest.fixture()
    def tmp_namespace(request):
        """Unique namespace per test. Real backend creates/deletes it."""
        ns_name = f"test-{uuid4().hex[:12]}"
        backend = request.config.getoption("--backend")
        if backend == "real":
            _create_real_namespace(ns_name)
            yield ns_name
            _delete_real_namespace(ns_name)
        else:
            yield ns_name

    @pytest.fixture()
    def factory(request, tmp_namespace):
        """Factory wired to backend based on --backend flag."""
        backend = request.config.getoption("--backend")
        if backend == "real":
            return KuberokuFactory(
                k8s_client=K8sClient.from_context(),
                namespace=tmp_namespace,
            )
        return KuberokuFactory(
            k8s_client=FakeK8sClient(),
            namespace=tmp_namespace,
        )

    THE RULE: Cluster lifecycle is external (CI provisions k3d/kind,
    developer runs Colima). Namespace = function scope. Tests get a
    fresh namespace every time.

    WHY FAKE IS FUNCTION-SCOPED:
    FakeK8sClient is a dict. Creating one takes microseconds. Making it
    session-scoped would mean tests share state — a debugging nightmare.
    Always create a fresh fake per test.

    FIXTURE DEPENDENCY CHAIN:
    ─────────────────────────
    test function
      └── factory (function) ← KuberokuFactory, backend-aware
            └── tmp_namespace (function) ← unique ns, create/delete on real


PERFORMANCE EXPECTATIONS PER COMMAND TYPE
──────────────────────────────────────────
Beyond startup time (<100ms for --help, <50ms for --version), every
command category has a latency budget measured from CLI invocation to
output. These are WALL-CLOCK targets on a local cluster with <10ms
network latency.

    Command Category              Target     K8s API Calls    Notes
    ──────────────────────────    ──────     ─────────────    ─────────────────
    --help, --version             <100ms     0                No K8s connection
    apps:create                   <500ms     2-3 (create CM)  ConfigMap creates
    apps:destroy                  <1s        5-10 (deletes)   Multiple resources
    apps (list)                   <500ms     1 (list CMs)     Single list call
    apps:info                     <500ms     2-3 (get CMs)    Parallel gets
    config:set                    <1s        2-3 (get+update) Update + restart
    config (list)                 <500ms     1 (get CM)       Single get
    deploy --image                <2s        3-5 (updates)    Excluding rollout
                                                              wait time
    deploy (from git)             varies     3-5 + build      Build time dominates
    releases (list)               <500ms     1 (list CMs)     Single list call
    releases:rollback             <2s        3-5 (updates)    Same as deploy
    ps (list)                     <1s        2 (list pods +   Pods can be slow
                                              list deploys)
    ps:scale                      <1s        1-2 (patch)      Patch deployment
    logs                          <500ms     1 (pod logs)     Initial fetch only
    services:expose:on               <500ms     1-2 (update svc) Patch service type
    addons:create                 <2s        3-5 (creates)    StatefulSet + Svc
    addons:connect                <500ms     1 (port-forward  Connection setup
                                              setup)
    domains:add                   <1s        2-3 (Ingress)    Create/update
    apps:maintenance:on           <500ms     1-2 (update CM)  Single update
    run (interactive)             <1s        1 (exec)         Exec setup only
    run --detach                  <1s        1 (create Job)   Job creation only
    apps:status                   <1s        3-5 (parallel)   Aggregate view

    TESTING PERFORMANCE:
    ────────────────────
    # tests/infrastructure/test_startup.py — already exists

    # tests/infrastructure/test_latency.py — NEW
    @pytest.mark.needs("api_crud")
    def test_apps_create_latency(factory):
        start = time.monotonic()
        factory.apps.create("bench-app")
        elapsed = time.monotonic() - start
        # SDK call should complete in <500ms on fake, <1s on real
        assert elapsed < 2.0  # generous: 2x the target

    @pytest.mark.needs("api_crud")
    def test_apps_list_latency(factory):
        for i in range(10):
            factory.apps.create(f"bench-{i}")
        start = time.monotonic()
        apps = factory.apps.list()
        elapsed = time.monotonic() - start
        assert elapsed < 2.0
        assert len(apps) == 10

    NOTE: Performance tests use generous 2x multipliers because CI
    environments are noisy. The targets above are for humans to verify
    locally. CI tests catch regressions (e.g., accidental O(n) → O(n²)).


EXCEPTION COVERAGE MANDATE
──────────────────────────
Coverage percentage alone does NOT guarantee that every exception is
tested. You can have high coverage and still miss entire error classes
if no test triggers them.

    RULE: Every exception class in exceptions.py MUST have at least
    one test that triggers it via a realistic SDK call.

    ENFORCEMENT:
    ────────────
    # tests/infrastructure/test_exceptions.py

    import inspect
    from kuberoku import exceptions

    ALL_EXCEPTIONS = [
        cls for name, cls in inspect.getmembers(exceptions, inspect.isclass)
        if issubclass(cls, exceptions.KuberokuError) and cls is not exceptions.KuberokuError
    ]

    # This test doesn't test the exceptions themselves — it verifies
    # that other tests exercise them. Run after the full test suite.

    @pytest.mark.parametrize("exc_class", ALL_EXCEPTIONS,
                             ids=lambda c: c.__name__)
    def test_exception_is_triggered_somewhere(exc_class, exception_tracker):
        """Verify that at least one test in the suite raises this exception.

        Uses a conftest fixture that tracks all raised exceptions via
        a pytest plugin hook (pytest_exception_interact or a custom
        marker-based approach).
        """
        assert exc_class in exception_tracker.seen, (
            f"{exc_class.__name__} is never raised in any test. "
            f"Add a test that triggers this exception via a realistic SDK call."
        )

    SIMPLER ALTERNATIVE (if the tracker is too complex):
    ────────────────────────────────────────────────────
    Maintain a manual mapping. Every exception must have a test ID:

    EXCEPTION_TEST_MAP = {
        "AppNotFoundError":         "tests/apps/test_info.py::TestInfoSDK::test_nonexistent_app",
        "AppAlreadyExistsError":    "tests/apps/test_create.py::TestCreateSDK::test_create_duplicate",
        "DeployError":              "tests/deploy/test_image.py::TestDeploySDK::test_deploy_failure",
        "RolloutTimeoutError":      "tests/deploy/test_image.py::TestDeploySDK::test_rollout_timeout",
        "ImageNotFoundError":       "tests/deploy/test_image.py::TestDeploySDK::test_bad_image",
        "ConfigVarNotFoundError":   "tests/config/test_get.py::TestGetSDK::test_get_nonexistent",
        "ReleaseNotFoundError":     "tests/deploy/test_releases_info.py::test_nonexistent_release",
        "ClusterNotFoundError":     "tests/clusters/test_add.py::test_nonexistent_cluster",
        "ClusterUnreachableError":  "tests/clusters/test_add.py::test_unreachable_cluster",
        "PermissionDeniedError":    "tests/infrastructure/test_factory.py::test_permission_denied",
        "AddonNotFoundError":       "tests/addons/test_info.py::test_nonexistent_addon",
        "AddonTypeNotSupportedError":"tests/addons/test_create.py::test_unsupported_type",
        "DomainAlreadyExistsError": "tests/domains/test_add.py::test_duplicate_domain",
        "PluginError":              "tests/infrastructure/test_plugins.py::test_bad_plugin",
    }

    def test_all_exceptions_have_tests():
        """Every exception in exceptions.py has a mapped test."""
        import inspect
        from kuberoku import exceptions
        all_exc = {
            name for name, cls in inspect.getmembers(exceptions, inspect.isclass)
            if issubclass(cls, exceptions.KuberokuError)
            and cls is not exceptions.KuberokuError
        }
        mapped = set(EXCEPTION_TEST_MAP.keys())
        unmapped = all_exc - mapped
        assert not unmapped, f"Exceptions without tests: {unmapped}"

    When you add a new exception class, CI fails until you add its test
    mapping. This makes exception coverage non-optional.


DEVELOPMENT WORKFLOW — TDD PER COMMAND
──────────────────────────────────────
When adding a new command, follow this exact sequence:

    STEP 1: WRITE THE TEST FILE FIRST
    ──────────────────────────────────
    Create tests/{feature}/test_{command}.py with both SDK and CLI test
    classes. Write at minimum the 14-point checklist (see above). All
    tests will fail — that's the point.

        # Example: adding addons:migrate
        # Create: tests/addons/test_versioning.py

        class TestMigrateSDK:
            def test_migrate_changes_version(self, factory):
                factory.apps.create("myapi")
                factory.addons.create("myapi", "postgres", version="16")
                factory.addons.migrate("myapi", "postgres", target_version="17")
                addon = factory.addons.info("myapi", "postgres")
                assert addon.image == "postgres:17"

            def test_migrate_nonexistent_raises(self, factory):
                factory.apps.create("myapi")
                with pytest.raises(AddonNotFoundError):
                    factory.addons.migrate("myapi", "nope", target_version="17")

            def test_migrate_deletes_pvc_for_postgres(self, factory):
                factory.apps.create("myapi")
                factory.addons.create("myapi", "postgres", version="16")
                factory.addons.migrate("myapi", "postgres", target_version="17")
                # PVC should be deleted (backup+restore handles data)

        class TestMigrateCLI:
            def test_migrate_output(self, cli_runner):
                cli_runner.invoke(cli, ["apps:create", "myapi"])
                cli_runner.invoke(cli, ["addons:create", "postgres", "-a", "myapi"])
                result = cli_runner.invoke(cli, [
                    "addons:migrate", "postgres", "--version", "17", "-a", "myapi"
                ])
                assert result.exit_code == 0
                assert "migrated" in result.output.lower()

    STEP 2: IMPLEMENT THE SDK METHOD
    ─────────────────────────────────
    Add the method to the appropriate service class. Write ONLY enough
    code to make the SDK tests pass. Do not touch CLI code yet.

        # src/kuberoku/sdk/addons.py
        class AddonsService:
            def upgrade(self, app: str, instance: str, *, version: str) -> Addon:
                ...

    Run: pytest tests/addons/test_upgrade.py::TestUpgradeSDK -v

    STEP 3: WIRE THE CLI COMMAND
    ────────────────────────────
    Add the Click command. It should be ~10 lines: parse args, call SDK,
    format output. Write ONLY enough to make the CLI tests pass.

        # src/kuberoku/cli/addons.py
        @addons.command("migrate")
        @click.argument("instance")
        @click.option("--version", required=True)
        @click.pass_context
        def migrate(ctx, instance, version):
            factory = ctx.obj["factory"]
            app = ctx.obj["app"]
            addon = factory.addons.migrate(app, instance, target_version=version)
            echo(f"Migrated {instance} to version {version}")

    Run: pytest tests/addons/test_upgrade.py -v  (ALL tests)

    STEP 4: VERIFY
    ───────────────
    - All 14 checklist items pass
    - Coverage on the new code meets the enforced threshold
    - mypy and ruff pass
    - The exception mapping includes any new exceptions

    STEP 5: ADD TO A FLOW TEST (if the command fits an existing flow)
    ─────────────────────────────────────────────────────────────────
    If the command is part of a common user journey, add it to the
    relevant flow test. If not, skip this step — not every command
    needs a flow test.

    THE GOLDEN RULE: Tests exist BEFORE the implementation. If you
    write the SDK method first and tests after, you're testing that
    your code does what your code does — not that it does what it
    SHOULD do. TDD forces you to think about behavior before code.


PLUGIN INTERACTION TESTS
────────────────────────
Plugins extend Kuberoku with new commands. The core must be robust
against misbehaving plugins — a bad plugin should NEVER crash the CLI
or corrupt state.

    TEST FILE: tests/infrastructure/test_plugins.py

    # ── LOADING ──────────────────────────────────────────────

    def test_valid_plugin_loads(cli_runner, tmp_plugin):
        """A well-formed plugin's commands appear in --help."""
        result = cli_runner.invoke(cli, ["--help"])
        assert "myplugin" in result.output

    def test_plugin_missing_entry_point_ignored(cli_runner, bad_plugin_no_entry):
        """Package named kuberoku-* but no entry point → silently ignored."""
        result = cli_runner.invoke(cli, ["--help"])
        assert result.exit_code == 0  # didn't crash

    def test_plugin_import_error_logged_not_crashed(cli_runner, bad_plugin_import_error):
        """Plugin raises ImportError → warning logged, CLI continues."""
        result = cli_runner.invoke(cli, ["apps:create", "myapi"])
        assert result.exit_code == 0  # CLI works fine
        # Warning should appear in debug mode
        result = cli_runner.invoke(cli, ["--debug", "apps:create", "myapi2"])
        assert "plugin" in result.output.lower() and "error" in result.output.lower()

    def test_plugin_exception_in_command_doesnt_crash_cli(cli_runner, bad_plugin_raises):
        """Plugin command raises RuntimeError → error shown, exit 1."""
        result = cli_runner.invoke(cli, ["badplugin:explode"])
        assert result.exit_code == 1
        assert "Error:" in result.output
        # Other commands still work
        result = cli_runner.invoke(cli, ["apps:create", "myapi"])
        assert result.exit_code == 0

    # ── SANDBOXING ───────────────────────────────────────────

    def test_plugin_cannot_override_core_command(cli_runner, plugin_overrides_apps):
        """Plugin that tries to register 'apps:create' → ignored, core wins."""
        result = cli_runner.invoke(cli, ["apps:create", "myapi"])
        assert result.exit_code == 0
        assert "Created myapi" in result.output  # core behavior, not plugin

    def test_plugin_gets_factory_via_context(cli_runner, plugin_uses_factory):
        """Plugin accesses factory via ctx.obj['factory'] — same wiring."""
        result = cli_runner.invoke(cli, ["myplugin:status", "-a", "myapi"])
        # Plugin was able to call factory.apps.get() successfully
        assert result.exit_code == 0

    def test_plugin_cannot_modify_factory(cli_runner, plugin_mutates_factory):
        """Plugin that tries to replace factory.k8s → AttributeError or no effect."""
        cli_runner.invoke(cli, ["badplugin:hijack"])
        # Core commands still use original factory
        result = cli_runner.invoke(cli, ["apps:create", "myapi"])
        assert result.exit_code == 0

    # ── FIXTURES ─────────────────────────────────────────────

    @pytest.fixture
    def tmp_plugin(tmp_path, monkeypatch):
        """Create a temporary plugin package with entry_points."""
        plugin_dir = tmp_path / "kuberoku_myplugin"
        plugin_dir.mkdir()
        (plugin_dir / "__init__.py").write_text("""
import click

@click.group("myplugin")
def myplugin():
    pass

@myplugin.command("status")
@click.pass_context
def status(ctx):
    click.echo("plugin status ok")
""")
        # Register via entry_points mock
        monkeypatch.setattr(
            "importlib.metadata.entry_points",
            lambda group=None: [FakeEntryPoint("myplugin", plugin_dir)]
            if group == "kuberoku.plugins" else []
        )

    NOTE: Plugin tests do NOT need a real K8s cluster. They test the
    plugin loading, sandboxing, and error handling — all of which use
    FakeK8sClient.


HEROKU-STYLE GROUP EDGE CASE TESTS
───────────────────────────────────
The ColonCommandGroup.resolve_command() override is the foundation of
the colon-command UX. It MUST handle every edge case gracefully.

    TEST FILE: tests/infrastructure/test_colon_commands.py

    # ── BASIC EQUIVALENCE ────────────────────────────────────

    def test_colon_and_space_equivalent(runner):
        """apps:create and apps create invoke the same command."""
        r1 = runner.invoke(cli, ["apps:create", "test1"])
        r2 = runner.invoke(cli, ["apps", "create", "test2"])
        assert r1.exit_code == r2.exit_code == 0

    def test_deeply_nested_colon(runner):
        """releases:info works (two levels: releases → info)."""
        runner.invoke(cli, ["apps:create", "myapi"])
        runner.invoke(cli, ["deploy", "--image", "nginx", "-a", "myapi"])
        result = runner.invoke(cli, ["releases:info", "1", "-a", "myapi"])
        assert result.exit_code == 0

    # ── ERROR CASES ──────────────────────────────────────────

    def test_unknown_group_shows_error(runner):
        """foo:bar where 'foo' is not a group → helpful error."""
        result = runner.invoke(cli, ["foo:bar"])
        assert result.exit_code == 2  # usage error
        assert "No such command" in result.output or "Error" in result.output

    def test_unknown_subcommand_shows_error(runner):
        """apps:banana where 'banana' is not a subcommand → helpful error."""
        result = runner.invoke(cli, ["apps:banana"])
        assert result.exit_code == 2
        assert "No such command" in result.output or "Error" in result.output

    def test_double_colon_shows_error(runner):
        """apps::create (double colon) → error, not crash."""
        result = runner.invoke(cli, ["apps::create", "myapi"])
        assert result.exit_code != 0
        # Must not raise an unhandled exception
        assert "Traceback" not in result.output

    def test_trailing_colon_shows_error(runner):
        """apps: (trailing colon, no subcommand) → shows group help."""
        result = runner.invoke(cli, ["apps:"])
        # Should either show help or an error — not crash
        assert "Traceback" not in result.output

    def test_leading_colon_shows_error(runner):
        """:create (leading colon) → error, not crash."""
        result = runner.invoke(cli, [":create"])
        assert result.exit_code != 0
        assert "Traceback" not in result.output

    def test_colon_in_argument_not_split(runner):
        """config:set KEY=postgres://host:5432/db — colon in VALUE not split."""
        runner.invoke(cli, ["apps:create", "myapi"])
        result = runner.invoke(cli, [
            "config:set", "DATABASE_URL=postgres://host:5432/db", "-a", "myapi"
        ])
        assert result.exit_code == 0
        # Verify the value was stored correctly (colon preserved)
        result = runner.invoke(cli, ["config:get", "DATABASE_URL", "-a", "myapi"])
        assert "postgres://host:5432/db" in result.output

    # ── HELP TEXT ────────────────────────────────────────────

    def test_root_help_shows_all_groups(runner):
        """kuberoku --help lists all command groups."""
        result = runner.invoke(cli, ["--help"])
        assert result.exit_code == 0
        for group in ["apps", "config", "deploy", "ps", "addons",
                       "domains", "logs", "run", "releases"]:
            assert group in result.output

    def test_group_help_shows_subcommands(runner):
        """kuberoku apps --help lists create, destroy, info, etc."""
        result = runner.invoke(cli, ["apps", "--help"])
        assert result.exit_code == 0
        assert "create" in result.output
        assert "destroy" in result.output

    def test_colon_help_works(runner):
        """kuberoku apps:create --help shows command help."""
        result = runner.invoke(cli, ["apps:create", "--help"])
        assert result.exit_code == 0
        assert "APP_NAME" in result.output or "app" in result.output.lower()

    # ── TAB COMPLETION ───────────────────────────────────────

    def test_completion_includes_groups(runner):
        """Shell completion at root level includes command groups."""
        # Test via Click's get_completions or similar mechanism
        completions = get_completions(cli, [], "")
        group_names = [c.value for c in completions]
        assert "apps" in group_names
        assert "config" in group_names

    def test_completion_includes_subcommands(runner):
        """Shell completion after 'apps' includes subcommands."""
        completions = get_completions(cli, ["apps"], "")
        sub_names = [c.value for c in completions]
        assert "create" in sub_names
        assert "destroy" in sub_names


================================================================================
 13. DEPENDENCIES
================================================================================

BUILD SYSTEM
────────────
    [build-system]
    requires = ["hatchling"]
    build-backend = "hatchling.build"

PYTHON VERSION
──────────────
    requires-python = ">=3.11"

RUNTIME DEPENDENCIES (4 total, effectively 2 new)
──────────────────────────────────────────────────
    Dependency          Version     Why                         Note
    ──────────────────  ──────────  ──────────────────────────  ────────────────
    kubernetes          >=35        Official K8s Python client  Core
    click               >=8.1       CLI framework               Core
    rich                >=13        Pretty output, tables       Standalone (no typer)
    pyyaml              >=6         YAML parsing (config)       Already a kubernetes dep

    WHY CLICK, NOT TYPER:
    Kuberoku's CLI is not a simple flag-parsing wrapper. The core
    experience — colon commands, plugin mounting, custom group
    resolution — all require extending click.Group directly:

    - ColonCommandGroup extends click.Group with resolve_command() override
    - Plugins provide click.Group instances mounted as subcommands
    - Click's CliRunner is used for E2E tests
    - Tab completion relies on real Click group hierarchy

    Typer wraps Click but would be a leaky abstraction here. We'd
    constantly reach through to Click internals, adding complexity
    for zero benefit. Using Click directly is cleaner, simpler,
    and gives us full control over command resolution.

DEV DEPENDENCIES
────────────────
    pytest              Testing framework
    pytest-cov          Coverage reporting
    pytest-mock         Mocking helpers
    mypy                Static type checking (--strict)
    ruff                Linting + formatting

ENTRY POINT
───────────
    [project.scripts]
    kuberoku = "kuberoku.cli.main:run"


================================================================================
 14. DISTRIBUTION
================================================================================

INSTALL METHODS
───────────────
    # Python (recommended)
    pipx install kuberoku

    # Python (standard)
    pip install kuberoku

    # One-liner (detects OS + arch)
    curl -fsSL https://github.com/amanjain/kuberoku/releases/latest/download/install.sh | sh

    # macOS (Homebrew) — planned
    brew install kuberoku/tap/kuberoku

    # Standalone binary (no Python required)
    # Download from GitHub Releases:
    #   kuberoku-linux-amd64
    #   kuberoku-linux-arm64
    #   kuberoku-darwin-amd64
    #   kuberoku-darwin-arm64
    #   kuberoku-windows-amd64.exe

PYINSTALLER BINARY BUILD
─────────────────────────
    - Single-file executables for each platform
    - Built in CI on release tags (v*)
    - Uploaded to GitHub Releases
    - Homebrew formula points to GitHub Release binary (planned)

PYPI PUBLISHING
───────────────
    - Package name: kuberoku
    - Published via GitHub Actions on tag push
    - Uses trusted publisher (OIDC) — no API tokens


================================================================================
 15. PLUGIN SYSTEM
================================================================================

CONVENTION
──────────
Plugins are PyPI packages named `kuberoku-*`. Anyone can create one.

    Package name:      kuberoku-postgres
    Provides:          kuberoku postgres:backup, kuberoku postgres:restore, etc.
    Mechanism:         Python entry_points

PLUGIN AUTHOR EXPERIENCE
─────────────────────────
    # kuberoku-postgres/pyproject.toml
    [project]
    name = "kuberoku-postgres"

    [project.entry-points."kuberoku.plugins"]
    postgres = "kuberoku_postgres:plugin"

    # kuberoku_postgres/__init__.py
    import click

    @click.group()
    def plugin():
        """PostgreSQL management for Kuberoku apps."""
        pass

    @plugin.command()
    @click.argument("app")
    @click.pass_context
    def backup(ctx, app):
        """Create a database backup."""
        factory = ctx.obj["factory"]       # <-- access to K8s, namespace, etc.
        k8s = factory.k8s
        ns = factory.namespace
        # Plugin has full access to K8s client and all factory services
        pods = k8s.list_pods(ns, labels={
            f"{factory.domain}/app": app,
            f"{factory.domain}/addon-type": "postgres",
        })
        ...

    @plugin.command()
    @click.argument("app")
    @click.pass_context
    def restore(ctx, app):
        """Restore from a backup."""
        factory = ctx.obj["factory"]
        ...

    KEY: Plugins get the factory via Click context (ctx.obj["factory"]).
    This gives them the same K8s client, namespace, prefix, and domain
    that core commands use. No special plugin API needed — just Click.

PLUGIN DISCOVERY (in Kuberoku core)
────────────────────────────────────
    import importlib.metadata

    def load_plugins(root_group):
        """Discover and mount all installed kuberoku-* plugins."""
        eps = importlib.metadata.entry_points(group="kuberoku.plugins")
        for ep in eps:
            try:
                plugin_group = ep.load()
                root_group.add_command(plugin_group, ep.name)
            except Exception as e:
                warnings.warn(f"Failed to load plugin {ep.name}: {e}")

USER EXPERIENCE
───────────────
    $ pip install kuberoku-postgres
    $ kuberoku postgres:backup myapp       # Just works
    $ kuberoku plugins
    Installed plugins:
      postgres    kuberoku-postgres 0.2.1

PLUGIN MANAGEMENT COMMANDS
──────────────────────────
    kuberoku plugins                           List installed plugins
    kuberoku plugins:install kuberoku-postgres  Install (wraps pip install)
    kuberoku plugins:uninstall kuberoku-postgres Remove plugin
    kuberoku plugins:search postgres            Search PyPI for kuberoku-* packages

PLUGIN LIMITATIONS BY INSTALL METHOD
─────────────────────────────────────
    Install Method          Plugins Supported?    Why
    ─────────────────────   ────────────────────  ────────────────────────────────
    pip install kuberoku    Yes (full support)    entry_points work natively
    pipx install kuberoku   Yes (with inject)     `pipx inject kuberoku kuberoku-postgres`
    PyInstaller binary      No                    Frozen binary, no pip install
    brew install kuberoku   Yes (pip-backed)      Homebrew Python formula (planned)

    For PyInstaller/standalone binary users: plugins are not supported.
    Use the Python package install (pip/pipx) if you need plugins.

PLUGIN SANDBOXING — WHAT PLUGINS CAN AND CANNOT DO
───────────────────────────────────────────────────
    Plugins CAN:
    - Add new CLI command groups (postgres:backup, redis:snapshot, etc.)
    - Access the factory via ctx.obj["factory"] (K8s, namespace, prefix)
    - Use any core SDK service (factory.apps, factory.config, etc.)
    - Ship their own internal logic, models, and helpers
    - Read/write their own K8s resources (labeled with their own identifiers)

    Plugins CANNOT:
    - Modify core commands (no "hook into deploy" or "intercept create")
    - Replace or wrap existing SDK services
    - Intercept or proxy K8s calls (no middleware on K8sClient)
    - Monkey-patch core modules
    - Register event hooks or lifecycle callbacks (no event system exists)

    This is intentional. Plugins EXTEND Kuberoku; they don't MODIFY it.
    If a plugin needs to run logic during deploy, the user composes it:

        kuberoku deploy --image myapi:v3 && kuberoku postgres:backup myapi

    NOT: "register a post-deploy hook that runs postgres:backup."
    Composition over hooks. Explicit over magic.


================================================================================
 16. IMPLEMENTATION PHASES
================================================================================

DESIGN PRINCIPLES FOR PHASING
─────────────────────────────
Each phase follows these rules:

    1. DEMO AFTER EVERY PHASE — After completing a phase, you can demo
       something end-to-end that feels like real progress. No invisible
       infrastructure-only phases (except Phase 0-1).

    2. DEPENDENCY ORDER — Each phase only depends on things built in
       prior phases. Never build a consumer before its producer.

    3. ONE SUBSYSTEM PER PHASE — Each phase introduces one new concept
       or capability. No phase bundles two unrelated subsystems.

    4. --image BEFORE build-from-git — The pre-built image deploy path
       (kuberoku deploy --image) is strictly simpler than build-from-git
       (git archive + docker build + push). Ship the simple path first,
       add git/docker later.

ERROR MESSAGE CONVENTIONS (APPLIES FROM PHASE 2 ONWARD)
────────────────────────────────────────────────────────
Every error message in the CLI follows this format:

    Error: <what went wrong, one line>

    <why it happened, 1-2 lines if non-obvious>

    <what to do about it, always present>

Rules:
    - NEVER just say "failed" or "error occurred". Always say WHAT failed.
    - ALWAYS suggest a fix or next step. Every error ends with an action.
    - Use the user's language, not K8s internals. Say "App 'myapi' not
      found" not "ConfigMap kuberoku-app-myapi not found in namespace".
    - Include the exact command to fix it when possible.
    - Exit codes: 0 = success, 1 = user error (bad input, not found),
      2 = infrastructure error (K8s unreachable, permission denied).

Examples:

    $ kuberoku config:set DATABASE_URL=postgres://...
    Error: No app specified.

    Run this inside a linked project, or specify the app:
        kuberoku config:set DATABASE_URL=... --app myapi
        kuberoku config:set DATABASE_URL=... --app staging

    $ kuberoku deploy --image myapi:v3
    Error: App 'myapi' has no process types yet.

    This is the first deploy. Kuberoku needs to know what to run.
    Add a Procfile to your project, or set commands manually:
        kuberoku ps:set web="gunicorn app:app"

    $ kuberoku services:expose:on smtp
    Error: Process 'smtp' has no ports defined.

    Add ports first:
        kuberoku services:ports:add 25/tcp --type smtp

    $ kuberoku addons:create postgres
    Error: Permission denied — cannot create statefulsets.

    Your kubeconfig user "dev-user" lacks the "create" verb on "statefulsets".
    Ask your cluster admin, or run: kuberoku doctor --fix

These conventions apply to ALL commands from Phase 2 onward. The CLI
catches KuberokuError at the top level and formats it per these rules.
Non-KuberokuError exceptions are unexpected bugs and show a full traceback
plus "This is a bug. Please report it at <repo>/issues".


PHASE 0 — PROJECT SKELETON
───────────────────────────
Goal: `pip install -e .` works, `kuberoku --version` prints version.
Demo: "I installed it and it runs."

    Files:
    - pyproject.toml (hatchling, deps, entry point)
    - src/kuberoku/__init__.py (exports Kuberoku, __version__)
    - src/kuberoku/__main__.py (python -m kuberoku)
    - src/kuberoku/_version.py (__version__ = "0.1.0")
    - src/kuberoku/branding.py (TOOL_NAME, PREFIX, DOMAIN, all derived constants)
    - src/kuberoku/cli/main.py (root group + --version flag)
    - src/kuberoku/models.py (all dataclasses)
    - src/kuberoku/exceptions.py (full hierarchy)
    - tests/conftest.py

    Verification:
    $ pip install -e ".[dev]"
    $ kuberoku --version
    kuberoku 0.1.0
    $ pytest tests/ -v                       # (no tests yet, but imports work)
    $ mypy src/kuberoku/ --strict            # passes

PHASE 1 — K8S ABSTRACTION + FAKE CLIENT
────────────────────────────────────────
Goal: K8sClientProtocol defined, FakeK8sClient complete, label helpers tested.
Demo: "Tests pass against a fake K8s — we can develop offline."

    Files:
    - src/kuberoku/k8s/protocols.py (K8sClientProtocol)
    - src/kuberoku/k8s/labels.py (PREFIX, label builders, selector helpers)
    - src/kuberoku/k8s/resources.py (resource dict builders)
    - src/kuberoku/k8s/client.py (K8sClient — real, wraps kubernetes library)
    - tests/backends/fake.py (FakeK8sClient)
    - tests/infrastructure/test_labels.py
    - tests/infrastructure/test_models.py

    Verification:
    $ pytest tests/ -v --cov=src/kuberoku --cov-fail-under=90
    $ mypy src/kuberoku/ --strict

PHASE 2 — APPS + LINK + APP RESOLUTION (END-TO-END PROOF)
──────────────────────────────────────────────────────────
Goal: Full app lifecycle + project linking. Proves the entire architecture.
Demo: "I created an app, linked my project directory, and every command
       resolves the app name automatically."

    Why apps:link:add/remove is here (not later): App resolution (--app flag →
    KUBEROKU_APP env → .kuberoku file) is used by EVERY subsequent command.
    Building it now means Phase 3+ commands get ergonomic app resolution
    for free. Without it, every verification step needs --app.

    Files:
    - src/kuberoku/sdk/apps.py (AppsService)
    - src/kuberoku/cli/apps.py (apps:* commands)
    - src/kuberoku/cli/main.py (ColonCommandGroup, mount apps group)
    - src/kuberoku/factory.py (KuberokuFactory)
    - src/kuberoku/client.py (Kuberoku facade with .apps)
    - src/kuberoku/config/project.py (.kuberoku read/write)
    - src/kuberoku/cli/context.py (app resolution: flag → env → .kuberoku)
    - src/kuberoku/cli/apps.py (includes apps:link:add, apps:link:remove subgroup)
    - tests/apps/test_create.py, test_destroy.py, test_info.py, test_list.py
    - tests/apps/test_link.py
    - tests/infrastructure/test_context.py
    - tests/infrastructure/test_colon_commands.py
    - tests/infrastructure/test_factory.py

    NetworkPolicy (pulled forward from Phase 5 — app-level isolation):
    - k8s/protocols.py: NetworkPolicy CRUD methods
    - k8s/client.py: real NetworkPolicy CRUD (networking.k8s.io/v1)
    - k8s/resources.py: build_app_deny_policy(), build_process_allow_policy(),
      build_addon_allow_policy(), build_external_allow_policy()
    - tests/backends/fake.py: NetworkPolicy CRUD in FakeK8sClient
    - tests/backends/real.py: NetworkPolicy error normalization
    - sdk/apps.py: deny policy on apps:create, cleanup on destroy, update on rename
    - tests/apps/test_create.py: deny policy creation tests
    - tests/apps/test_destroy.py: policy cleanup tests
    - tests/apps/test_rename.py: policy label update tests
    - tests/infrastructure/test_resources.py: 10 resource builder tests
    - tests/contract/test_network_policy_crud.py: 8 contract tests

    Doctor (pulled forward from original Phase 12, now Phase 11):
    - src/kuberoku/sdk/doctor.py (DoctorService — 6 diagnostic checks)
    - src/kuberoku/cli/clusters.py (doctor/setup commands live here, not cli/doctor.py)
    - tests/infrastructure/test_doctor.py (17 tests: SDK + CLI)

    Real K8s testing (pulled forward from original Phase 13, now in Phase 4):
    - tests/backends/real.py (RealK8sClient: wraps K8sClient with error normalization)
    - tests/conftest.py (--backend flag, tmp_namespace fixture)
    - tests/contract/ (32 contract tests across 6 resource types)
    - .github/workflows/tests.yml (lint + fake + k3d integration)
    - .github/workflows/compat.yml (kind weekly)
    - .github/workflows/release.yml (gate on tag v*)

    Proves:
    - CLI → SDK → Protocol → K8s pipeline works
    - Colon commands work (apps:create AND apps create)
    - FakeK8sClient is sufficient for unit tests
    - FakeK8sClient matches real K8s (32 contract tests pass on both)
    - CliRunner E2E tests work
    - App resolution chain works (flag → env → .kuberoku → interactive)
    - Network isolation: deny-all policy created on apps:create
    - Doctor: cluster health diagnostics pass on real K8s

    Verification:
    $ kuberoku apps:create testapp
    $ kuberoku apps:link:add testapp
    Linked to testapp (wrote .kuberoku)
    $ kuberoku apps                          # no --app needed
    $ kuberoku apps:info                     # resolves from .kuberoku
    $ kuberoku apps:link:add testapp-staging --as staging
    $ kuberoku apps:info --app staging       # resolves "staging" alias
    $ kuberoku apps:info                     # still resolves default
    $ kuberoku apps:link:remove --as staging
    $ kuberoku apps:destroy testapp --confirm testapp
    $ kuberoku apps:link:remove
    $ kuberoku doctor                        # all checks pass
    $ pytest tests/ -v --cov=kuberoku
    $ pytest tests/contract/ -v --backend real  # 32 pass on real K8s

PHASE 3 — CONFIG
─────────────────
Goal: Set, get, unset environment variables. Config vars stored in
      ConfigMap (or Secret with --secret). Rolling restart on change.
Demo: "I set DATABASE_URL and it's ready for my app to read at deploy time."

    Why separate from deploy: Config is a prerequisite (env vars must exist
    before deploy reads them), and it's a self-contained ConfigMap CRUD
    subsystem. Shipping it alone means you can verify ConfigMap handling
    without the complexity of Deployments.

    Files:
    - src/kuberoku/sdk/config.py (ConfigService)
    - src/kuberoku/cli/config.py (config, config:set, config:get, config:unset)
    - tests/config/test_set.py, test_get.py, test_unset.py

    Verification:
    $ kuberoku apps:create testapp && kuberoku apps:link:add testapp
    $ kuberoku config:set DATABASE_URL=postgres://localhost/test SECRET_KEY=hunter2
    $ kuberoku config
    $ kuberoku config:get DATABASE_URL
    $ kuberoku config:unset SECRET_KEY
    $ kuberoku config:set API_TOKEN=secret --secret
    $ pytest tests/ -v --cov=kuberoku

PHASE 4 — DEPLOY + RELEASES + BUILD-FROM-GIT + PS (MERGED)
───────────────────────────────────────────────────────────
NOTE: This phase merged original Phases 4, 6, and 7 (minus logs/run).
    The split between --image deploy, build-from-git, and PS operations
    proved unnecessary — they share the same Deployment/Formation data
    model and are better tested together. Logs and run remain for Phase 6.

Deferred from Phase 3 (Config):
    - Wire rolling restart on config:set/unset (annotation-based restart
      of Deployments when config changes). Requires Deployment to exist.
    - Wire release creation on config change (env_diff tracking in Release model).
    - --no-restart flag on config:set/unset (no-op until restart is wired).
    ALL NOW IMPLEMENTED in this phase.

Destroy cleanup contract:
    Every resource type created by any command MUST be cleaned up in
    apps:destroy, in reverse order of creation. Current cleanup order:
      1. Deployments    (created by deploy)
      2. Services       (created by deploy/services:expose:on)
      3. Secrets        (created by config:set --secret)
      4. NetworkPolicies(created by apps:create)
      5. ConfigMaps     (created by apps:create — manifest, env, formation)
    When adding new resource types (e.g., Ingress, PVC, CronJob), add
    cleanup to destroy() above ConfigMaps and update this list.

Goal: Full deploy lifecycle — pre-built images, build-from-git, releases,
      rollback, process management (scale, restart, stop, type, list).
Demo: "I deployed from git, scaled to 3, rolled back, and verified
       network isolation between two apps — all on real K8s."

    What this phase proves:
    - Deploy --image (Deployment create/update, Service, NetworkPolicy)
    - Build-from-git (git archive → docker build → push to registry)
    - Registry auto-detection (EKS→ECR, GKE→AR, AKS→ACR, local)
    - Release tracking (ConfigMap per release, optimistic concurrency)
    - Rollback (creates new release with old images)
    - Multi-process-type deploys (web + worker from same image)
    - Procfile reading (TYPE: COMMAND format)
    - Formation management (replicas + commands per process type)
    - PS operations (scale, restart, stop, type, list_dynos)
    - Config restart (config:set/unset triggers rolling restart + release)
    - Rollout wait (poll pods, detect CrashLoopBackOff/ImagePull/OOM)
    - Process-allow NetworkPolicy (same-app ingress on declared ports)
    - Cross-app isolation (deny-all blocks cross-app traffic)

    Files:
    - src/kuberoku/sdk/deploy.py (DeployService — --image + build-from-git)
    - src/kuberoku/sdk/releases.py (ReleasesService)
    - src/kuberoku/sdk/ps.py (PsService — full: set, commands, scale, restart,
      stop, list, type)
    - src/kuberoku/sdk/build.py (BuildService: git archive, docker build, push)
    - src/kuberoku/sdk/registry.py (auto-detect registry from cluster endpoint)
    - src/kuberoku/sdk/procfile.py (Procfile parser)
    - src/kuberoku/cli/deploy.py
    - src/kuberoku/cli/releases.py
    - src/kuberoku/cli/ps.py (all ps:* commands)
    - tests/deploy/ (test_image, test_release_create, test_releases_list,
      test_rollback, test_releases_prune, test_procfile_deploy,
      test_wait, test_build, test_multi_app)
    - tests/ps/ (test_set, test_commands, test_scale, test_restart,
      test_stop, test_list, test_type)
    - tests/infrastructure/ (test_procfile, test_port_parsing, test_registry,
      test_cli_phase4)
    - tests/e2e/test_hello_world.py (21 tests: full lifecycle on real K8s)
    - tests/e2e/test_git_deploy.py (11 tests: HEAD/tag/SHA deploy via registry)
    - tests/e2e/test_network_isolation.py (9 tests: cross-app isolation)

    Bug fixes discovered during E2E testing:
    - cluster_endpoint was a stub returning "" — now reads from K8s client
    - build.py push/load ordering was backwards for colima + registry
    - ps.restart() 404 race on already-terminated pods during rolling updates

    Verification (638 tests total):
    $ pytest tests/ --ignore=tests/e2e -v   # 597 unit tests (fake K8s)
    $ pytest tests/e2e/ -v -s               # 41 E2E tests (real K8s)
    $ mypy src/kuberoku/ --strict            # 38 source files, clean
    $ ruff check src/ tests/                 # clean

PHASE 5 — MULTI-CLUSTER + DOCTOR/SETUP + TCP PROBES
─────────────────────────────────────────────────────
Goal: Manage multiple K8s clusters, switch between them, per-cluster
      config (namespace, base_domain, resource_prefix). Shared check
      registry for clusters:doctor and clusters:setup. Proper TCP probes
      in E2E tests (nc instead of wget).
Demo: "I added my staging and production clusters and switched between them."
      "clusters:setup auto-created my namespace and told me to install Calico."

    Why early: Multi-cluster is pure config management (~/.kuberoku/config.yaml).
    No K8s resource dependencies beyond what Phase 4 already provides. Having
    cluster switching available early means all subsequent phases can be tested
    across multiple clusters. Doctor/setup share the same check registry.

    Commands:
    - clusters (list), clusters:add, clusters:remove, clusters:switch
    - clusters:current, clusters:info
    - clusters:doctor (check cluster health)
    - clusters:setup (check + auto-fix)

    Check registry (sdk/checks.py):
    - 13 checks: api_reachable, namespace_access, configmap_crud, secret_crud,
      deployment_crud, service_crud, network_policy_crud, storage_class,
      cni_enforcement, ingress_controller, ingress_crud, cert_manager,
      lb_support
    - 3 fixable: namespace_access (creates namespace), cni_enforcement (installs
      CNI on non-k3s), cert_manager (installs cert-manager)
    - Each check returns CheckResult (name, category, passed, message, detail,
      fixable, fix_hint)

    K8s protocol additions: list_daemonsets, list_nodes, server_version

    Files:
    - src/kuberoku/sdk/clusters.py (ClustersService)
    - src/kuberoku/sdk/checks.py (check registry)
    - src/kuberoku/sdk/doctor.py (refactored to use check registry)
    - src/kuberoku/cli/clusters.py (full group + doctor/setup)
    - src/kuberoku/config/user.py (~/.kuberoku/config.yaml)
    - tests/clusters/test_checks.py, test_clusters.py, test_cli_clusters.py
    - tests/infrastructure/test_user_config.py, test_doctor.py (refactored)

    Verification:
    $ kuberoku clusters:add production --context prod-eks --namespace prod
    $ kuberoku clusters:switch staging
    $ kuberoku clusters:doctor
    $ kuberoku clusters:setup
    $ pytest tests/ -v --cov=kuberoku

PHASE 6 — LOGS + EXEC (services:logs + services:exec)
──────────────────────────────────────────────────────
Goal: Log streaming and one-off commands (exec into pod, detached Jobs).
Demo: "I tailed my app's logs and ran a database migration."

    NOTE: PS operations (scale, restart, stop, type, list) were merged
    into Phase 4. This phase adds the remaining runtime operations.
    Logs and exec are subcommands of `services` for consistency with the
    restructured command hierarchy (services:expose, services:ports, etc.).

    Implementation:
    - get_logs(), stream_logs(), exec_interactive(), exec_detached() added
      to ServicesService (sdk/services.py) — no new service class needed
    - _parse_duration() helper for --since flag
    - Dyno naming: sorted pods per type → web.1, web.2, worker.1
    - FakeK8sClient enhanced with inject_pod_logs() / inject_exec_output()

    New models: ExecResult, DetachedJob (models.py)
    New exceptions: NoDynosError, DynoNotFoundError, InvalidDurationError

    Modified files:
    - src/kuberoku/sdk/services.py (get_logs, stream_logs, exec_*, helpers)
    - src/kuberoku/cli/services.py (logs + exec CLI commands)
    - src/kuberoku/k8s/resources.py (build_job: extra_env + timeout)
    - src/kuberoku/models.py (ExecResult, DetachedJob)
    - src/kuberoku/exceptions.py (NoDynosError, DynoNotFoundError, InvalidDurationError)
    - tests/backends/fake.py (inject_pod_logs, inject_exec_output)

    New test files:
    - tests/services/test_logs.py, test_logs_follow.py, test_exec.py
    - tests/infrastructure/test_duration.py, test_cli_logs_exec.py

    Verification:
    $ kuberoku services:logs --tail
    web.1 | Listening on :8080
    $ kuberoku services:exec bash                      # exec into web.1
    $ kuberoku services:exec --detach rake db:backup   # background Job
    $ pytest tests/ -v && mypy src/kuberoku/ --strict && ruff check src/ tests/

PHASE 7 — ADDONS
─────────────────
Goal: Attach stateful services (postgres, redis). Direct CPU/memory/
      storage (no plans). Guaranteed QoS. Multi-instance. Backup.
      Credential rotation. Operator integration for HA. External addon support.
Demo: "I attached postgres and 2 redis instances, backed up my database,
       rotated credentials, and connected with psql — all zero-config."

    This is the biggest phase. It turns kuberoku from a deployment tool
    into a platform. See Section 4.8 for the full addon specification.

    Key architecture:
    - AddonDef dataclass (one file per addon type, pure data)
    - ADDON_REGISTRY dict (built-in + plugin entry_points)
    - CLI is generic (one file, works for any addon type)
    - SDK is generic (AddonsService dispatches via AddonDef)
    - Adding new addon = one new file, zero CLI/SDK changes

    New K8s protocol methods:
    - StatefulSet CRUD (create, get, update, delete, list)
    - PVC CRUD (create, get, update, delete)
    - exec_in_pod (interactive + streaming, also needed for Phase 6: run)

    Files:
    - src/kuberoku/sdk/addons.py (AddonsService)
    - src/kuberoku/cli/addons.py (all addons:* commands, generic)
    - src/kuberoku/addons/__init__.py (ADDON_REGISTRY, register())
    - src/kuberoku/addons/_types.py (AddonDef dataclass)
    - src/kuberoku/addons/postgres.py, redis.py
    - tests/addons/test_create.py, test_destroy.py, test_info.py,
      test_list.py, test_upgrade.py, test_backup.py,
      test_credentials.py, test_cli_addon.py, test_multi_instance.py,
      test_external.py, test_ephemeral.py
    - tests/contract/test_statefulset_crud.py, test_pvc_crud.py

    Verification:
    $ kuberoku addons:create postgres --app myapp
    $ kuberoku addons:create postgres --as analytics --cpu 500m --memory 1Gi
    $ kuberoku addons:create redis --ephemeral --app myapp
    $ kuberoku addons:exec postgres
    $ kuberoku addons:backup postgres --to ./dump.sql
    $ kuberoku addons:credentials:rotate postgres
    $ kuberoku addons:scale postgres --memory 512Mi
    $ kuberoku addons:destroy analytics --confirm analytics
    $ pytest tests/ -v --cov=kuberoku

PHASE 8 — NETWORKING (SERVICES / ADDONS NETWORKING / PORTS)
───────────────────────────────────────────────────────────
Goal: Make deployed services reachable. Expose via LoadBalancer/NodePort,
      port-forward for debugging, add/remove ports without redeploying.
Demo: "I deployed my SMTP service, exposed it, and it has a public IP.
       I can port-forward to my addon to debug it locally."

    Files:
    - src/kuberoku/sdk/services.py (ServicesService — expose:on/off, open, ports)
    - src/kuberoku/sdk/addons.py (AddonsService — gains expose:on/off, connect)
    - src/kuberoku/cli/services.py (services:expose:on/off, services:open, services:ports:*)
    - src/kuberoku/cli/addons.py (addons:expose:on/off, addons:connect)
    - tests/services/test_expose.py, test_unexpose.py
    - tests/ports/test_ports_add.py, test_ports_remove.py
    - tests/networking/test_network_policy.py (policy lifecycle tests)

    NetworkPolicy integration:
    - DONE (Phase 2): deny-all on apps:create, cleanup on destroy
    - DONE (Phase 4): process-allow on deploy (same-app ingress on declared ports)
    - DONE (Phase 4): E2E cross-app isolation tests (9 tests on real K8s)
    - REMAINING: external-allow on services:expose:on/off and addons:expose:on/off (this phase)

    Verification:
    $ kuberoku deploy --image myapp:v3 --type smtp \
        --port 25/tcp --port 587/tcp
    $ kuberoku services:expose:on smtp
    External IP: 34.123.45.67
      :25  :587
    $ kuberoku services:ports:add 2525/tcp --type smtp
    Added 2525/tcp. Rolling restart...
    $ kuberoku services:ports
    25/tcp  587/tcp  2525/tcp
    $ kuberoku services:expose:off smtp
    Internal only.
    $ pytest tests/ -v --cov=kuberoku

PHASE 9 — DOMAINS + INGRESS + GATEWAY
──────────────────────────────────────
Goal: Domain routing via Ingress, gateway service for non-HTTP traffic,
      4 new doctor checks, auto-domain on first web deploy.
Demo: "I added a custom domain, got auto-SSL, and exposed my addon
       via the shared gateway."

    Three-tier exposure model:
    1. HTTP traffic  → Ingress (shared controller LB, hostname routing)
    2. Non-HTTP      → Gateway Service (shared LB, allocated ports 10000-10999)
    3. Non-HTTP fixed → Dedicated LoadBalancer (user chooses, own IP)

    Files:
    - src/kuberoku/sdk/domains.py (DomainsService: add/remove/list/clear/auto_domain)
    - src/kuberoku/sdk/ingress.py (controller detection, cert-manager detection, LB support)
    - src/kuberoku/sdk/gateway.py (GatewayService: allocate/deallocate/list/get_ip)
    - src/kuberoku/cli/domains.py (domains, domains:add, domains:remove, domains:clear)
    - tests/domains/ (test_add, test_remove, test_list, test_clear, test_auto_domain, test_cli_domains)
    - tests/infrastructure/test_ingress_detection.py, test_gateway.py
    - tests/clusters/test_checks_phase9.py
    - tests/addons/test_expose_gateway.py, tests/services/test_expose_gateway.py
    - tests/contract/test_ingress_crud.py

    Modified:
    - factory.py (domains, gateway, base_domain properties)
    - k8s/protocols.py + k8s/client.py (list_ingresses, list_ingress_classes)
    - tests/backends/fake.py (create_ingress_class, inject_nodes, list_ingresses)
    - models.py (Domain updated, IngressControllerInfo, GatewayAllocation added)
    - exceptions.py (DomainNotFoundError, AutoDomainRemoveError, IngressControllerNotFoundError,
                     CertManagerNotFoundError, GatewayPortExhaustedError)
    - sdk/checks.py (4 new checks: ingress_controller, ingress_crud, cert_manager, lb_support)
    - sdk/deploy.py (auto-domain hook after release creation)
    - sdk/apps.py (Ingress cleanup on destroy)
    - k8s/resources.py (build_ingress rewritten for multi-rule + TLS + className)

    Verification:
    $ kuberoku domains:add myapp.example.com
    $ kuberoku addons:expose:on redis
    $ pytest tests/ -v --cov=kuberoku

Phase 9.7 — STATUS + MAINTENANCE + E2E GAPS + NORTHSTAR CLEANUP (DONE)
───────────────────────────────────────────────────────────────────────
Goal: apps:status aggregator, maintenance mode, fill E2E test gaps, spec cleanup.
Demo: "I can see my full app status at a glance and put services into maintenance."

    Implemented:
    - apps:status aggregator command (single-glance overview)
    - apps:maintenance:on/off (all processes → save replicas, scale to 0, restore)
    - services:maintenance:on/off TYPE (per-process maintenance)
    - Storage: annotations on app manifest ConfigMap
    - Mechanism: scale to 0 (HTTP → 503, Non-HTTP → connection refused)

    E2E test gaps filled:
    - apps:rename E2E
    - releases:prune E2E
    - clusters:config E2E (future — clusters:config not yet implemented)

    NORTHSTAR cleanup:
    - Removed memcached addon type (not implemented, use plugin for caches)
    - Removed addons:restore command (use addons:exec + manual restore)
    - Removed addons:logs command (use services:logs pattern or addons:exec)

PHASE 10 — PLUGIN SYSTEM
─────────────────────────
Goal: Third-party extensions via entry_points. Plugin discovery,
      install, uninstall.
Demo: "I installed a community plugin and its commands appeared in the CLI."

    Files:
    - src/kuberoku/plugins/loader.py
    - src/kuberoku/cli/plugins.py
    - src/kuberoku/cli/main.py (call load_plugins on startup)
    - tests/infrastructure/test_plugins.py

    Verification:
    $ kuberoku plugins
    $ kuberoku plugins:install kuberoku-postgres
    $ pytest tests/ -v --cov=kuberoku

PHASE 11 — RBAC CHECKS + DOCTOR ENHANCEMENTS
──────────────────────────────────────────────
Goal: Enhance doctor with RBAC permission checks and --fix YAML output.
Demo: "Doctor told me which permissions I was missing and gave me the YAML."

    NOTE: Core doctor command was implemented in Phase 2. Maintenance/status
    moved to Phase 9.7. This phase adds RBAC-specific enhancements on top.

    Files:
    - src/kuberoku/sdk/doctor.py (extend with RBAC checks per resource type)
    - src/kuberoku/cli/clusters.py (add --fix flag for RBAC YAML output; doctor lives here)
    - tests/infrastructure/test_doctor.py (extend with RBAC tests)

    Verification:
    $ kuberoku doctor --fix
    $ pytest tests/ -v --cov=kuberoku

PHASE 12 — DISTRIBUTION
────────────────────────
Goal: Install Kuberoku anywhere: pip, pipx, curl, binaries. Homebrew planned.
Demo: "Anyone can install it in one command on any platform."

    Files:
    - .github/workflows/release.yml (PyPI publish + PyInstaller binaries)
    - Homebrew formula (kuberoku/tap/kuberoku) — planned
    - scripts/install.sh (curl one-liner)

    Verification:
    $ pip install kuberoku                   # from PyPI
    $ pipx install kuberoku                  # isolated
    $ brew install kuberoku/tap/kuberoku     # macOS (planned)
    $ kuberoku --version


PHASE SUMMARY
─────────────
NOTE: Original 15 phases (0-14) were consolidated to 13 phases (0-12).
    Phase 4 merged original Phases 4 (deploy), 6 (build-from-git), and
    7 (PS operations). Phase 13 (integration tests) was absorbed into
    Phase 4 E2E tests. Multi-cluster moved to Phase 5 (was 9) to enable
    multi-cluster testing early. Remaining phases reordered accordingly.

    Phase   Name                          Commands Added             Demo Milestone                   Status
    ─────   ────────────────────────────  ─────────────────────────  ──────────────────────────────   ──────
    0       Project skeleton              --version                  "It installs and runs"           DONE
    1       K8s abstraction + fake        (none — infrastructure)    "Tests pass offline"             DONE
    2       Apps + links + resolution     apps:*, apps:link:*,       "I created an app"               DONE
                                          doctor
    3       Config                        config:*                   "I set env vars"                 DONE
    4       Deploy + build + PS (merged)  deploy, releases:*,        "Deployed from git, scaled,      DONE
                                          ps:*, build-from-git       rolled back, network isolated"
    5       Multi-cluster                 clusters:*                 "I switch between clusters"      DONE
    6       Logs + exec                   services:logs,             "I can tail logs and exec"       DONE
                                          services:exec
    7       Addons                        addons:*                   "I attached postgres"            DONE
    8       Networking                    services:*, addons:expose:*,   "My service is live"         DONE
                                          addons:connect
    9       Domains + ingress + gateway   domains:*, gateway         "Custom domain with SSL,         DONE
                                                                      gateway for non-HTTP"
    9.7     Status + maintenance +        apps:status,               "Status, maintenance, cleanup"   DONE
            E2E gaps + cleanup            apps:maintenance:*,
                                          services:maintenance:*
    10      Plugins                       plugins:*                  "Community extensions work"       DONE
    11      Doctor + RBAC                 doctor --fix               "Permission diagnostics"         DONE
    12      Distribution                  (none — packaging)         "Anyone can install it"          DONE


================================================================================
 17. COMPLETE COMMAND SUMMARY
================================================================================

All commands at a glance, with their Heroku equivalents:

    Kuberoku Command                Heroku Equivalent               Phase
    ──────────────────────────────  ──────────────────────────────  ─────
    kuberoku apps                   heroku apps                     2
    kuberoku apps:create            heroku apps:create              2
    kuberoku apps:info              heroku apps:info                2
    kuberoku apps:destroy           heroku apps:destroy             2
    kuberoku apps:rename            heroku apps:rename              2
    kuberoku apps:link              heroku git:remote               2
    kuberoku apps:link:add          heroku git:remote               2
    kuberoku apps:link:remove       (none)                          2
    kuberoku config                 heroku config                   3
    kuberoku config:set             heroku config:set               3
    kuberoku config:unset           heroku config:unset             3
    kuberoku config:get             heroku config:get               3
    kuberoku deploy                 heroku container:release        4
    kuberoku releases               heroku releases                 4
    kuberoku releases:info          heroku releases:info            4
    kuberoku releases:rollback      heroku releases:rollback        4
    kuberoku releases:prune         (none)                          4
    kuberoku ps                     heroku ps                       4
    kuberoku ps:scale               heroku ps:scale                 4
    kuberoku ps:restart             heroku ps:restart               4
    kuberoku ps:stop                heroku ps:stop                  4
    kuberoku ps:type                heroku ps:type                  4
    kuberoku ps:set                 Procfile (file-based)           4
    kuberoku ps:commands            (none)                          4
    kuberoku services:logs          heroku logs                     6
    kuberoku services:exec          heroku run                      6
    kuberoku clusters               (none — Heroku is managed)      5
    kuberoku clusters:add           (none)                          5
    kuberoku clusters:remove        (none)                          5
    kuberoku clusters:switch        (none)                          5
    kuberoku clusters:current       (none)                          5
    kuberoku clusters:info          (none)                          5
    kuberoku clusters:doctor        (none)                          2/11
    kuberoku clusters:setup         (none — auto-fix cluster)       5
    kuberoku addons                 heroku addons                   7
    kuberoku addons:create          heroku addons:create            7
    kuberoku addons:destroy         heroku addons:destroy           7
    kuberoku addons:info            heroku addons:info              7
    kuberoku addons:backup          (none)                          7
    kuberoku addons:scale           (none)                          7
    kuberoku addons:migrate         (none — automated backup+restore) 10
    kuberoku addons:migrate-rollback (none — rollback to prev version) 10
    kuberoku addons:exec             (none — exec into addon pod)    7
    kuberoku addons:credentials     (none)                          7
    kuberoku addons:credentials:rotate (none)                       7
    kuberoku services                (none — Heroku auto-exposes)    8
    kuberoku services:expose:on         (none)                          8
    kuberoku services:expose:off       (none)                          8
    kuberoku addons:expose:on           (none)                          8
    kuberoku addons:expose:off         (none)                          8
    kuberoku addons:connect          (none — port-forward)           8
    kuberoku services:connect        (none — port-forward to process) 8
    kuberoku services:ports          (none)                          8
    kuberoku services:ports:add      (none)                          8
    kuberoku services:ports:remove   (none)                          8
    kuberoku domains                heroku domains                  9
    kuberoku domains:add            heroku domains:add              9
    kuberoku domains:remove         heroku domains:remove           9
    kuberoku domains:clear          heroku domains:clear            9
    kuberoku apps:maintenance       heroku maintenance              9.7
    kuberoku apps:maintenance:on    heroku maintenance:on           9.7
    kuberoku apps:maintenance:off   heroku maintenance:off          9.7
    kuberoku services:maintenance:on  (none)                        9.7
    kuberoku services:maintenance:off (none)                        9.7
    kuberoku apps:status            heroku apps:info (richer)       9.7
    kuberoku services:open          heroku open                     8 (done)
    kuberoku plugins                heroku plugins                  10
    kuberoku plugins:install        heroku plugins:install          10
    kuberoku plugins:uninstall      heroku plugins:uninstall        10
    kuberoku plugins:search         (none)                          10

    (no standalone commands — every command belongs to a group)

COMMANDS KUBEROKU HAS THAT HEROKU DOESN'T
──────────────────────────────────────────
    - services:expose:on / services:expose:off (toggle public access for process types)
    - services:open (open HTTP process in browser)
    - services:connect (port-forward to process for local debugging)
    - addons:expose:on / addons:expose:off / addons:connect (addon networking)
    - services:ports:add / services:ports:remove (change ports without redeploying)
    - apps:link:add / apps:link:remove (namespace linking)
    - clusters:* (multi-cluster management)
    - doctor (permission preflight check)
    - apps:status (single-glance overview)
    - apps:maintenance:on/off (app-wide maintenance mode)
    - services:maintenance:on/off (per-process maintenance mode)
    - deploy --port X/udp (UDP support)
    - deploy --port X/tcp --port Y/tcp (multi-port)
    - plugins:search (PyPI search)
    - releases:prune (release history cleanup)
    - addons:backup (data safety)
    - addons:scale (resource management)
    - addons:migrate / addons:migrate-rollback (version migration with backup+restore)
    - ps:set (Procfile-equivalent via CLI, per-process commands)
    - ps:commands (show commands per process type)
    - deploy --no-procfile (skip Procfile reading)
    - logs --previous (crash debugging)


================================================================================
 18. APP NAME RESOLUTION
================================================================================

When a command needs an app name, Kuberoku resolves it in this order:

    Priority    Source                  Example
    ────────    ──────────────────────  ───────────────────────────
    1 (high)    --app / -a flag         kuberoku services:logs --app myapi
    2           KUBEROKU_APP env var    KUBEROKU_APP=myapi kuberoku services:logs
    3           .kuberoku project file  Walk up dirs to find .kuberoku
    4 (low)     Interactive picker      Fuzzy-searchable list of all apps

    Smart --app resolution:
        --app VALUE is resolved in two steps:
        1. Check .kuberoku aliases — if VALUE matches an alias, use that
           app + cluster.
        2. Otherwise, treat VALUE as a literal K8s app name.

        This means:
        - `kuberoku services:logs --app prod`    → resolves "prod" alias from .kuberoku
        - `kuberoku services:logs --app myapi`   → literal K8s app name (no alias match)
        - `kuberoku services:logs`               → uses "default" alias from .kuberoku

    .kuberoku file format (YAML):

        # Simple — single app (backward compatible):
        app: myapi

        # Multi-environment — same project, multiple apps/clusters:
        default: dev
        environments:
          dev:
            app: myapi-dev
            cluster: local
          staging:
            app: myapi-staging
            cluster: staging
          prod:
            app: myapi-prod
            cluster: production

        # Multi-app — project deploys several apps (API + worker, etc.):
        default: api-dev
        environments:
          api-dev:
            app: myapi-dev
            cluster: local
          worker-dev:
            app: myworker-dev
            cluster: local
          api-prod:
            app: myapi-prod
            cluster: production
          worker-prod:
            app: myworker-prod
            cluster: production

    Resolution with aliases:
        - No --app flag  → use the "default" alias
        - --app staging  → look up "staging" alias, use that app + cluster
        - --app myapi    → no alias match, use "myapi" as literal app name

    Created by:  kuberoku apps:link:add myapi
                 kuberoku apps:link:add myapi-staging --as staging --cluster staging
                 kuberoku apps:link:add myworker-dev --as worker-dev
    Removed by:  kuberoku apps:link:remove                (removes entire .kuberoku file)
                 kuberoku apps:link:remove --as staging   (removes only that alias)

    BACKWARD COMPATIBILITY: If .kuberoku has only `app: myapi` (no
    environments block), it works exactly as before. The aliases
    feature is additive — old .kuberoku files never break.

The CLI walks up the directory tree to find .kuberoku (like git walks up
to find .git). This means `kuberoku services:logs` just works inside your project.


================================================================================
 19. OPEN QUESTIONS & FUTURE WORK
================================================================================

These are explicitly deferred. They are NOT in scope for v0.1.0.

    - Buildpacks / Dockerfile auto-detection (for now: bring your own image)
    - Autoscaling (HPA integration)
    - Pipeline promotions (staging → production)
    - Review apps (PR-based ephemeral environments)
    - Metrics / monitoring integration
    - Web dashboard
    - Git push deploy (heroku git:push equivalent)
    - Multi-tenancy / team management
    - Billing integration
    - Managed database provisioning (beyond simple container addons)


================================================================================
                              END OF SPECIFICATION
================================================================================
