Metadata-Version: 2.4
Name: shenron
Version: 0.12.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
License-File: LICENSE
Summary: Generate Shenron docker-compose deployments from model config files
Author: doubleword.ai
License: MIT
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/doublewordai/shenron
Project-URL: Repository, https://github.com/doublewordai/shenron

# Shenron

Shenron now ships as a config-driven generator for production LLM docker-compose deployments.

`shenron` reads a model config YAML and generates:
- `docker-compose.yml`
- `.generated/onwards_config.json`
- `.generated/prometheus.yml`
- `.generated/scouter_reporter.env`
- `.generated/engine_start.sh`
- `.generated/engine_start_N.sh` + `.generated/sglangmux_start.sh` when using `models:`

## Quick Start

```bash
uv pip install shenron
shenron get
docker compose up -d
```

`shenron get` reads a per-release config index asset, shows available configs with arrow-key selection, downloads the chosen config, and generates deployment artifacts in the current directory. Using `--release latest` also rewrites `shenron_version` in the downloaded config to `latest`. You can also override config values on download with:
- `--api-key` (writes `api_key`)
- `--scouter-api-key` (writes `scouter_ingest_api_key`)
- `--scouter-collector-instance` (writes `scouter_collector_instance`; alias: `--scouter-colector-instance`)

By default, `shenron get` pulls release configs from `doublewordai/shenron-configs`.

`shenron .` still works and expects exactly one config YAML (`*.yml` or `*.yaml`) in the current directory, unless you pass a config file path directly.

## Configs

Repo configs are stored in `configs/`.

Available starter configs:
- `configs/Qwen06B-cu126-TP1.yml`
- `configs/Qwen06B-cu129-TP1.yml`
- `configs/Qwen06B-cu130-TP1.yml`
- `configs/Qwen30B-A3B-cu126-TP1.yml`
- `configs/Qwen30B-A3B-cu129-TP1.yml`
- `configs/Qwen30B-A3B-cu129-TP2.yml`
- `configs/Qwen30B-A3B-cu130-TP2.yml`
- `configs/Qwen235-A22B-cu129-TP2.yml`
- `configs/Qwen235-A22B-cu129-TP4.yml`
- `configs/Qwen235-A22B-cu130-TP2.yml`

This file uses the same defaults that were previously hardcoded in `docker/run_docker_compose.sh`.

Engine selection and args:
- `engine`: `vllm` or `sglang` (default: `vllm`)
- `vllm_args`: vLLM CLI args appended after core settings. Use this for `--gpu-memory-utilization`, `--scheduling-policy`, `--tool-call-parser`, `--override-generation-config`, etc.
- `sglang_args`: SGLang CLI args appended after core settings (use for `--tp`, `--dp`, `--ep`, `--enable-dp-attention`, etc.)
- `sglang_use_cuda_ipc_transport`: when `true`, exports `SGLANG_USE_CUDA_IPC_TRANSPORT=1` before launching SGLang.
- `models`: optional per-model overrides for multi-model SGLang mux mode.
- `sglangmux_listen_port`, `sglangmux_host`, `sglangmux_upstream_timeout_secs`, `sglangmux_model_ready_timeout_secs`, `sglangmux_model_switch_timeout_secs`, `sglangmux_log_dir`: optional `sglangmux` settings (hyphenated aliases like `sglangmux-listen-port` are also accepted).

`vllm_args` and `sglang_args` accept YAML scalars (string/number/bool). If you need to pass a structured value (like `--override-generation-config`), provide a YAML mapping and it will be JSON-encoded.

### Single Config `models:` Schema (SGLang + sglangmux)

When `models:` is set, Shenron generates one engine launch script per model plus a mux launcher:

```yaml
engine: sglang
sglangmux_listen_port: 8100
sglangmux_host: 0.0.0.0
sglangmux_upstream_timeout_secs: 120
sglangmux_model_ready_timeout_secs: 600
sglangmux_model_switch_timeout_secs: 120
sglangmux_log_dir: /tmp/sglangmux

models:
- model_name: Qwen/Qwen3-0.6B
  vllm_port: 8001
  api_key: sk-model-a
  sglang_args: [--tp, 1]
- model_name: Qwen/Qwen3-30B-A3B
  vllm_port: 8002
  api_key: sk-model-b
  sglang_use_cuda_ipc_transport: true
  sglang_args: [--tp, 2]
```

Rules in `models:` mode:
- `engine` must be `sglang`
- each `models[*].model_name` must be unique
- each `models[*].vllm_port` must be set and unique
- `sglangmux_listen_port` must be different from all model ports

In this mode, `.generated/onwards_config.json` contains one target per model and all target URLs point to `http://vllm:<sglangmux_listen_port>/v1`.

## Generated Compose Behavior

`docker-compose.yml` is fully rendered from config values:
- model image tag from `shenron_version` + `cuda_version`
- `onwards` image tag from `onwards_version`
- service ports from config
- no `${SHENRON_VERSION}` placeholders

## Development

```bash
# Run tests (Rust + CLI + compose checks)
./scripts/ci.sh

# Install local package for manual testing
python3 -m pip install -e .

# Generate from repo config
shenron configs/Qwen06B-cu126-TP1.yml --output-dir /tmp/shenron-test
```

## Release Automation

- `release-assets.yaml` publishes stamped config files (`*.yml`) as release assets.
- `release-assets.yaml` also publishes `configs-index.txt`, which powers `shenron get`.
- `release-assets.yaml` mirrors `*.yml` + `configs-index.txt` into `${OWNER}/shenron-configs` under the same tag as the main `shenron` release.
- Set `CONFIGS_REPO_TOKEN` (or reuse `RELEASE_PLEASE_TOKEN`) with write access to the configs repo release assets; optional repo variable `CONFIGS_REPO` overrides the default target (`${OWNER}/shenron-configs`).
- `python-release.yaml` builds/publishes the `shenron` package to PyPI on release tags.
- Docker image build/push via Depot remains in `ci.yaml` and still triggers when `docker/Dockerfile.cu*` or `VERSION` changes.

## License

MIT, see `LICENSE`.

