Skip to content
Architecture

Sampling

Stochastic token selection from the model's output probability distribution, as opposed to greedy (argmax) decoding.

Definition

Sampling selects the next token by drawing from the probability distribution defined by the softmax of the logits. Common variants include top-k sampling (restricts the distribution to the top k most probable tokens) and nucleus (top-p) sampling (restricts to the smallest set of tokens whose cumulative probability exceeds p). Sampling introduces controlled randomness that improves output diversity and reduces repetition compared to greedy decoding. Temperature scaling the logits before softmax allows tuning the sharpness of the distribution.

More Architecture terms