TPS / TPOT (Tokens per Second / Time per Output Token)
Output throughput metrics: TPS measures tokens generated per second, TPOT measures milliseconds between successive output tokens.
Definition
TPS (tokens per second) is a throughput metric measuring how many output tokens a system generates per unit time across all active requests. TPOT (time per output token) is the inter-token interval from the user's perspective: the delay between receiving token N and token N+1. TPOT is approximately 1000ms / (TPS per request). For streaming interfaces, TPOT governs the reading cadence; a TPOT under ~30–50 ms/token (>20–33 TPS per request) is needed for a smooth reading experience. Both are key production SLO targets.
Related
More Metrics terms
Arithmetic Intensity
The ratio of FLOPs to bytes of memory traffic for an operation, used to determine whether a workload is compute-bound or memory-bandwidth-bound.
Roofline Model
Visual performance model that shows achievable FLOP/s as a function of arithmetic intensity, with two ceilings: memory bandwidth and compute.
TTFT (Time to First Token)
Latency from sending the request to receiving the first generated token — primarily determined by prefill duration and queuing time.