Skip to content
Architecture

Prefill Phase

The initial forward pass that processes the full input prompt in parallel, producing the first output token and populating the KV cache.

Definition

The prefill phase takes the entire input prompt (potentially thousands of tokens) and processes it in a single parallel forward pass through the model. Because all prompt tokens are processed simultaneously with known inputs, the QKV projections are large dense matrix multiplies — a compute-bound workload. The prefill pass generates the KV cache entries for all prompt tokens and produces the logits for the first generated token. Prefill latency is the dominant contributor to time-to-first-token (TTFT).

More Architecture terms