Providers

Swival supports two providers: LM Studio (local) and HuggingFace Inference API (hosted). Under the hood, all LLM calls go through LiteLLM, which normalizes the API differences.

LM Studio

This is the default and requires no configuration beyond having LM Studio running with a model loaded.

Auto-discovery

Swival queries http://127.0.0.1:1234/api/v1/models at startup to find the loaded model. It looks for the first entry with type: "llm" and a non-empty loaded_instances array. The model identifier and context length are extracted automatically.

If no model is loaded, Swival exits with an error telling you to load one.

Custom base URL

If LM Studio is running on a different host or port:

swival --base-url http://192.168.1.100:1234 "task"

Manual model selection

If auto-discovery doesn't find the right model (e.g., multiple models loaded), you can specify it:

swival --model "qwen3-coder-next" "task"

Context size configuration

You can request a specific context length, which may trigger LM Studio to reload the model:

swival --max-context-tokens 131072 "task"

Swival calls LM Studio's /api/v1/models/load endpoint with the new context size. If the requested size matches what's already loaded, no reload happens. Reloads can be slow depending on the model and hardware.

How the LiteLLM call works

For LM Studio, Swival prefixes the model identifier with openai/ and sets api_base to {base_url}/v1. The API key is set to "lm-studio" (LM Studio doesn't require a real key). This tells LiteLLM to use the OpenAI-compatible API format.

HuggingFace Inference API

For hosted inference without running a local model.

Basic usage

export HF_TOKEN=hf_your_token_here
swival --provider huggingface --model meta-llama/Llama-3.3-70B-Instruct "task"

The --model flag is required and must be in org/model format. Authentication comes from HF_TOKEN in the environment or --api-key on the command line (which takes precedence).

Dedicated endpoints

For HuggingFace dedicated inference endpoints (private deployments):

swival --provider huggingface \
    --model meta-llama/Llama-3.3-70B-Instruct \
    --base-url https://xyz.endpoints.huggingface.cloud \
    --api-key hf_your_key \
    "task"

The --base-url points to your endpoint. The model identifier still needs to match what's deployed there.

How the LiteLLM call works

For HuggingFace, Swival prefixes the model with huggingface/ (stripping any existing prefix first) and passes the API key directly. If --base-url is set, it's passed as api_base to LiteLLM.

Future providers

Since Swival uses LiteLLM for the actual API call, adding new providers is straightforward -- it's mostly a matter of building the right model string and passing the right credentials. The provider-specific logic in call_llm() is about 10 lines per provider.