Skip to content
Architecture

Logits

Raw, unnormalised scores over the vocabulary produced by the model's final linear layer before softmax is applied.

Definition

Logits are the output of the language model head: a vector of length V (vocabulary size, e.g., 128,256 for Llama 3) representing the unnormalised log-probability of each possible next token. Applying softmax to the logits yields a proper probability distribution. In constrained decoding systems such as vLLM or SGLang, the logits may be post-processed with penalties, biases, or grammar masks before sampling occurs. Computing logits is a large matmul (hidden_dim × vocab) and is often memory-bandwidth bound.

More Architecture terms