Metadata-Version: 2.4
Name: sglangmux
Version: 0.1.1
Summary: SGLang multiplexer with an OpenAI-compatible frontend
Author: sglangmux contributors
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.9
Description-Content-Type: text/markdown

# sglangmux

`sglangmux` is a lightweight Rust multiplexer for running multiple SGLang model servers behind one OpenAI-compatible frontend.

It provides:
- one frontend endpoint for chat/completions
- automatic model activation/switching based on the request `model`
- OpenAI-style `/models` and `/v1/models` listing
- per-model process management with per-model stdout/stderr logs

## Repository Layout

- `src/lib.rs`: core multiplexer library (`SgLangMux`)
- `src/bin/sglangmuxd.rs`: HTTP daemon frontend
- `examples/sglangmux-manual/`: manual verification scripts for two models
- `tests/`: integration tests

## How It Works

1. You provide one launch script per model.
2. Each script must include:
   - model identifier via either `MODEL_NAME=<openai-model-id>` or launch arg `--model <openai-model-id>` (or `--model-path <openai-model-id>`)
   - local port via either `PORT=<local-port>` or launch arg `--port <local-port>`
3. `sglangmuxd` starts models (bootstrap), tracks active model state, and forwards requests to the correct upstream model server.
4. When the requested model differs from active model, the mux switches by pausing/sleeping current model and waking target model.

## Requirements

- Rust toolchain (for `cargo run`)
- Python environment with `sglang` installed for your model launch scripts
- GPU/runtime support required by your chosen SGLang models

## Python Install (uv / pip)

The project ships a Python CLI wrapper that executes the Rust daemon binary.

After publishing to PyPI, usage is:

```bash
uv pip install sglangmux
sglangmux --help
```

For local install from this repository:

```bash
uv pip install .
sglangmux --help
```

Notes:
- The wheel build runs `cargo build --release --bin sglangmuxd`.
- Installing from source requires a working Rust toolchain.
- The installed command is `sglangmux`, which forwards all args to `sglangmuxd`.

## Quick Start

### 1. Prepare Python env for model scripts

```bash
uv venv --python /usr/bin/python3.10 .venv
uv pip install --python .venv/bin/python sglang
```

### 2. Start mux with example scripts

```bash
./examples/sglangmux-manual/start_sglangmux.sh
```

### 3. Send requests

```bash
./examples/sglangmux-manual/request_models.sh
./examples/sglangmux-manual/request_qwen.sh
./examples/sglangmux-manual/request_hf.sh
```

See `examples/sglangmux-manual/README.md` for detailed manual workflow.

## Running `sglangmuxd` Directly

```bash
cargo run --bin sglangmuxd -- \
  --host 127.0.0.1 \
  --listen-port 30100 \
  --upstream-timeout-secs 120 \
  --model-ready-timeout-secs 120 \
  --model-switch-timeout-secs 60 \
  --log-dir sglangmux-logs \
  /path/to/model1.sh /path/to/model2.sh
```

### CLI Options

- `--host`: bind host for frontend daemon (default `127.0.0.1`)
- `--listen-port`: bind port for frontend daemon (default `30100`)
- `--upstream-timeout-secs`: timeout waiting for upstream model response (default `120`)
- `--model-ready-timeout-secs`: timeout while waiting for model process to become healthy (default `120`)
- `--model-switch-timeout-secs`: timeout waiting for model activation/switch for a pending request (default `60`)
- `--log-dir`: directory for per-model logs (default `sglangmux-logs`)

To expose externally:

```bash
--host 0.0.0.0
```

## Frontend API

Implemented routes:

- `GET /health`
- `GET /models`
- `GET /v1/models`
- `POST /v1/chat/completions`
- `POST /v1/completions`

Notes:
- Requests must include a string `model` field.
- For streaming (`stream: true` / SSE), `sglangmuxd` forwards the streaming payload through.

## Model Launch Script Contract

Each script passed to `sglangmuxd` must define a model id and local port. The model id can come from `MODEL_NAME` or launch flags `--model` / `--model-path`, and the local port can come from `PORT` or launch flag `--port`:

```bash
MODEL_NAME="Qwen/Qwen3-0.6B"
PORT=30001
```

The daemon parses these values from script text and uses them to build model registry and routing map.

## Timeouts and Failure Modes

- `upstream-timeout-secs`: model server did not respond in time for completion request (returns `504`)
- `model-ready-timeout-secs`: model process did not become healthy during startup/bring-up
- `model-switch-timeout-secs`: request waited too long for requested model to become active

Common frontend errors:
- `model not ready: ...`: switch/startup issue
- `upstream request timed out`: generation took longer than upstream timeout
- `invalid upstream response`: upstream returned non-JSON where JSON expected (non-stream path)

## Logging

Rust log filter is controlled by `RUST_LOG`.

Examples:

```bash
RUST_LOG=info ./examples/sglangmux-manual/start_sglangmux.sh
RUST_LOG=sglangmux=info,sglangmuxd=info,warn ./examples/sglangmux-manual/start_sglangmux.sh
```

Per-model stdout/stderr log files are written under `--log-dir`.

## Graceful Shutdown

`sglangmuxd` listens for `Ctrl+C` and triggers model shutdown via mux cleanup logic before exit.

## Development

Build:

```bash
cargo check --bin sglangmuxd
```

Test:

```bash
cargo test
```

## Publishing to PyPI

Use the helper script:

```bash
scripts/publish_pypi.sh
```

Upload to TestPyPI:

```bash
scripts/publish_pypi.sh --testpypi
```

The script:
- builds sdist + wheel (`python -m build`)
- runs `twine check` on artifacts
- uploads via `twine upload`
