Skip to content
Metrics

TPS / TPOT (Tokens per Second / Time per Output Token)

Output throughput metrics: TPS measures tokens generated per second, TPOT measures milliseconds between successive output tokens.

Definition

TPS (tokens per second) is a throughput metric measuring how many output tokens a system generates per unit time across all active requests. TPOT (time per output token) is the inter-token interval from the user's perspective: the delay between receiving token N and token N+1. TPOT is approximately 1000ms / (TPS per request). For streaming interfaces, TPOT governs the reading cadence; a TPOT under ~30–50 ms/token (>20–33 TPS per request) is needed for a smooth reading experience. Both are key production SLO targets.

More Metrics terms