SGLang
LLM inference runtime from Stanford LMSYS featuring RadixAttention, speculative execution, and structured generation support.
Definition
SGLang (Structured Generation Language) is a high-performance LLM serving framework from the Stanford LMSYS group. Its distinguishing features include RadixAttention for generalised prefix caching, native support for structured generation and constrained decoding (JSON, regex), and an efficient runtime for multi-call programs. SGLang is particularly well-suited for agent workflows that reuse contexts across multiple LLM calls. Benchmarks show SGLang matching or exceeding vLLM throughput on many workloads, especially those with shared prefixes.
Related
More Software terms
PagedAttention
vLLM's technique for storing KV cache in non-contiguous memory pages, eliminating fragmentation and enabling larger effective batch sizes.
FlashAttention
IO-aware exact attention algorithm that tiles computation to stay in SRAM, cutting HBM reads/writes and speeding up attention by 2–4×.
Continuous Batching
Scheduling technique that adds new requests to a running batch as soon as any sequence finishes, maximising GPU utilisation compared to static batching.