Skip to content
Optimization

Prefix Caching

Reusing KV cache blocks computed for a shared prompt prefix across multiple requests, eliminating redundant prefill computation.

Definition

Prefix caching exploits the observation that many requests share a common prefix — such as a system prompt, a few-shot example, or a document context. By storing the KV cache blocks for that prefix and reusing them for subsequent requests, the prefill computation for the shared portion is eliminated entirely. vLLM and SGLang both implement automatic prefix caching. RadixAttention in SGLang generalises this to an LRU radix tree structure that can match any shared subtree, not just exact prefixes.

More Optimization terms