Metadata-Version: 2.4
Name: switch-sdk
Version: 0.3.0
Summary: Python SDK for Shift managed AI routing, telemetry, and local-first execution
Author: Shift
License-Expression: LicenseRef-Proprietary
Keywords: ai,llm,routing,telemetry,sustainability,executorch
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Operating System :: OS Independent
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: httpx>=0.27.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23.0; extra == "dev"
Requires-Dist: respx>=0.21.0; extra == "dev"
Provides-Extra: local
Requires-Dist: torch>=2.2.0; python_version < "3.14" and extra == "local"
Requires-Dist: transformers>=4.41.0; python_version < "3.14" and extra == "local"
Requires-Dist: optimum-executorch>=1.1.0; python_version < "3.14" and extra == "local"
Provides-Extra: publish
Requires-Dist: build>=1.2.2; extra == "publish"
Requires-Dist: twine>=5.1.1; extra == "publish"

# switch-sdk

Python SDK for the Shift (Switch gateway) managed API.

## Install

```bash
pip install switch-sdk
```

For local development:

```bash
pip install -e .[dev]
```

For local ExecuTorch runtime work:

```bash
pip install -e .[dev,local]
```

For packaging and publishing:

```bash
pip install -e .[publish]
```

Note: ExecuTorch wheels are not available for Python 3.14 yet. Use Python 3.10-3.13 (3.11 works well).

## Required values

- `base_url`: your gateway URL, for example `http://localhost:8000`
- `api_key`: your plain project key (for example `aura_...`), not the SHA256 hash

Environment shortcuts are supported:

- `SHIFT_BASE_URL` (fallback: `SWITCH_BASE_URL`)
- `SHIFT_API_KEY` (fallbacks: `SWITCH_API_KEY`, `API_KEY`)

## Quick start

```python
import asyncio
from switch_sdk import SwitchClient, ChatMessage


async def main() -> None:
    async with SwitchClient.from_env() as client:
        completion = await client.chat(
            model="gpt-5",
            messages=[ChatMessage(role="user", content="Reply with: SDK_OK")],
            residency="US",
            sla="realtime",
            capability_flags={"force_cloud": True, "preferred_region": "eastus"},
        )

        print(completion.choices[0].message.content)
        print(completion.switch_meta["route"]["target"]["region"])


asyncio.run(main())
```

Set env vars first:

```bash
export SHIFT_BASE_URL=http://localhost:8000
export SHIFT_API_KEY=aura_your_plain_project_key
```

## Hybrid local-first mode (ExecuTorch-ready)

`chat_hybrid()` tries local execution first, then falls back to cloud when needed.
Local models are cached on disk and downloaded only once per model version.

```python
import asyncio
from switch_sdk import ChatMessage, LocalModelManager, SwitchClient


manifest = [
    {
        "model_id": "smollm2-135m",
        "task": "chat",
        "download_url": "https://your-model-host/smollm2-135m.pte",
        "sha256": "replace_with_sha256",
        "size_mb": 550,
        "min_ram_gb": 4,
        "max_prompt_chars": 280,
        "rank": 10,
    },
]


async def main() -> None:
    manager = LocalModelManager(cache_dir="~/.shift/models", manifest=manifest)
    # Optional: real ExecuTorch adapter (requires deps below)
    from switch_sdk import build_executorch_text_runtime
    local_runtime = build_executorch_text_runtime(
        tokenizer_source="HuggingFaceTB/SmolLM2-135M-Instruct",
        max_new_tokens=96,
        prefer_optimum=True,
    )

    async with SwitchClient(
        base_url="http://localhost:8000",
        api_key="aura_your_plain_project_key",
        local_model_manager=manager,
        local_runtime=local_runtime,
    ) as client:
        completion = await client.chat_hybrid(
            model="auto",
            messages=[ChatMessage(role="user", content="Reply exactly: LOCAL_OK")],
            capability_flags={"auto_model": True},
        )
        print(completion.model)
        print(completion.choices[0].message.content)
        print(completion.switch_meta)


asyncio.run(main())
```

Notes:
- Default local runtime is a stub (for wiring/tests).
- `build_executorch_text_runtime(...)` provides a real adapter that prefers Optimum ExecuTorch and falls back to raw `executorch.runtime`.
- Cache path format: `~/.shift/models/<model_id>/<version>/model.pte`
- LRU eviction is applied when cache exceeds `max_cache_gb`.

Install local runtime dependencies:

```bash
pip install -e .[local]
```

Ready-made demo manifest:

- `/Users/proguy/Documents/projects/switch/switch-sdk/examples/local_manifest_smollm2_135m.json`

Runtime callable contract:

```python
from switch_sdk import ChatMessage, LocalModelHandle

async def my_executorch_runtime(messages: list[ChatMessage], handle: LocalModelHandle) -> str:
    # Load/use handle.path (.pte) with your ExecuTorch integration.
    # Return assistant text.
    return "LOCAL_EXECUTORCH_OK"
```

## Routing-only call

```python
decision = await client.route(
    model="gpt-5",
    residency="US",
    sla="realtime",
    capability_flags={"force_cloud": True},
)

print(decision.target.region)
print(decision.scores)
print(decision.candidate_breakdown)
```

## Dashboard + carbon endpoints

```python
summary = await client.get_dashboard_summary()
feed = await client.get_dashboard_feed(limit=20)
carbon = await client.get_live_carbon()

print(summary.summary.total_requests)
print(len(feed.items))
print(carbon.provider, carbon.regions.get("eastus"))
```

## Custom telemetry event

```python
from switch_sdk import TelemetryEvent

await client.track_event(
    TelemetryEvent(
        event_type="sdk_custom",
        request_id="custom-123",
        model="gpt-5",
        metadata={"feature": "my_feature"},
    )
)

await client.flush_telemetry()
```

## Error handling

```python
from switch_sdk import SwitchAPIError, SwitchNetworkError, SwitchTimeoutError

try:
    await client.route(model="gpt-5")
except SwitchAPIError as exc:
    print(exc.status_code, exc.detail)
except SwitchTimeoutError:
    print("Request timed out")
except SwitchNetworkError as exc:
    print(f"Network issue: {exc}")
```

## Notes

- The SDK is async-first.
- Use `async with SwitchClient(...)` so telemetry flushes cleanly on exit.
- Retries/backoff are built in for transient failures.
- Telemetry is best-effort and never blocks successful chat/route calls.

## Live switching checks

Automatic east/west region-switch verification script:

```bash
cd switch-sdk
.venv/bin/python examples/test_region_switching.py \
  --base-url http://localhost:8000 \
  --api-key aura_your_plain_project_key \
  --east-region eastus \
  --west-region westus \
  --central-region centralus \
  --check-chat
```

## From-env example script

```bash
export SHIFT_BASE_URL=http://localhost:8000
export SHIFT_API_KEY=aura_your_plain_project_key
python /Users/proguy/Documents/projects/switch/switch-sdk/examples/test_from_env.py
```

## Local ExecuTorch sanity check

Force local execution and fail if local runtime does not work:

```bash
cd /Users/proguy/Documents/projects/switch/switch-sdk
.venv311/bin/python examples/test_hybrid_local.py \
  --base-url http://localhost:8000 \
  --api-key dummy_local_only \
  --manifest-path examples/local_manifest_smollm2_135m.json \
  --executorch \
  --prefer-runtime \
  --tokenizer-source HuggingFaceTB/SmolLM2-135M-Instruct \
  --no-download \
  --no-cloud-fallback
```

Expected: output JSON includes `"source": "sdk-local"` in `switch_meta`.

## Release

See `/Users/proguy/Documents/projects/switch/switch-sdk/RELEASING.md` for TestPyPI and PyPI release steps.
