Skip to content
Metrics

SLO (Service Level Objective)

A target performance threshold (e.g., p95 TTFT < 500 ms, TPS > 30) that a production LLM system must meet to satisfy quality-of-service requirements.

Definition

An SLO (Service Level Objective) defines the measurable performance targets a production system commits to — commonly expressed as percentile latency bounds (e.g., p99 TTFT < 1 s) or minimum throughput floors (e.g., TPS > 20 per client). In LLM serving, SLOs constrain the operating point on the latency-throughput Pareto curve: a tighter TTFT SLO limits how aggressively requests can be batched. Capacity planning involves ensuring enough GPU headroom so that traffic spikes do not violate SLOs. SLOs also inform autoscaling triggers and hardware selection.

More Metrics terms