Context length is the maximum amount of text (messages + replies + system/instructions) the model can keep in mind.
Weight/activation precision. Lower precision uses less memory and may be faster, but can affect quality and compatibility.
Where inference runs. Options come from runtime capabilities.
Traditional attention is optimized for single requests; continuous batching is optimized for multiple parallel requests.