{{SERVER_MODELS_JS}} {{PLATFORM_JS}}
Controls randomness in responses (0 = deterministic, 2 = very random)
Limits token selection to top K most likely tokens
Nucleus sampling - considers tokens with cumulative probability up to P
Penalty for repeating tokens (1 = no penalty, >1 = less repetition)

Determines whether hybrid reasoning models, such as Qwen3, will use thinking.
🔥 Hot Models
🔧 By Recipe
llama.cpp
OGA Hybrid
OGA NPU
OGA CPU
FastFlowLM NPU
🏷️ By Category
Coding
Vision
Reasoning
Tool Calling
Reranking
Embeddings
Custom
Add a Model