Logits
Raw, unnormalised scores over the vocabulary produced by the model's final linear layer before softmax is applied.
Definition
Logits are the output of the language model head: a vector of length V (vocabulary size, e.g., 128,256 for Llama 3) representing the unnormalised log-probability of each possible next token. Applying softmax to the logits yields a proper probability distribution. In constrained decoding systems such as vLLM or SGLang, the logits may be post-processed with penalties, biases, or grammar masks before sampling occurs. Computing logits is a large matmul (hidden_dim × vocab) and is often memory-bandwidth bound.
Related
More Architecture terms
KV Cache
GPU memory buffer storing attention key/value tensors so they need not be recomputed for tokens already processed.
Multi-Head Attention (MHA)
Standard Transformer attention where every layer maintains separate Q, K, V projections for each attention head.
Grouped-Query Attention (GQA)
Attention variant that shares K/V heads across groups of query heads, shrinking KV cache size while retaining most of MHA's expressiveness.