Skip to content
Software

Continuous Batching

Scheduling technique that adds new requests to a running batch as soon as any sequence finishes, maximising GPU utilisation compared to static batching.

Definition

Traditional static batching waits until all sequences in a batch have finished before starting the next batch, leaving the GPU idle whenever a short sequence completes. Continuous batching (also called iteration-level scheduling or in-flight batching) inserts new requests into the batch at each decoding iteration. Because the GPU processes a new token for every active sequence per step, newly added requests begin contributing to throughput immediately. This dramatically improves GPU utilisation when sequence lengths vary widely, which is typical in production LLM workloads.

More Software terms