Skip to content
Architecture

Temperature

A scalar applied to logits before softmax that controls output randomness: values <1 sharpen the distribution, values >1 flatten it.

Definition

Temperature T divides the logits by T before applying softmax: lower temperatures (T < 1) concentrate probability mass on the most-likely tokens, producing more deterministic, repetitive output; higher temperatures (T > 1) flatten the distribution, producing more diverse and creative but potentially less coherent output. Temperature 0 (or near-zero) is equivalent to greedy decoding. Common production defaults range from 0.6–0.9 for chat applications and 0.2–0.4 for code generation.

More Architecture terms