Skip to content
Architecture

Embedding

Dense vector representation of a token in a high-dimensional space, learned during training to encode semantic and syntactic relationships.

Definition

An embedding maps a discrete token ID to a continuous vector of dimension D (e.g., 4096 for Llama 3). The embedding table is typically the largest parameter tensor in a model and is shared with the language-model head. At inference time, the input embedding lookup is trivial, but the final projection from the model's hidden states back to the vocabulary (the unembedding/lm_head operation) is a large matrix multiply that contributes meaningfully to latency at small batch sizes.

More Architecture terms