TTFT (Time to First Token)
Latency from sending the request to receiving the first generated token — primarily determined by prefill duration and queuing time.
Definition
Time to First Token (TTFT) measures the delay a user experiences before any output appears. It encompasses network round-trip time, request queuing in the scheduler, and the prefill computation for the input prompt. TTFT is directly proportional to prompt length (longer prompts mean more prefill compute) and system load (longer queues mean more wait time). For interactive applications, low TTFT (under 200–500 ms) is critical for perceived responsiveness, and it is one of the primary SLO metrics in production LLM serving.
Related
More Metrics terms
Arithmetic Intensity
The ratio of FLOPs to bytes of memory traffic for an operation, used to determine whether a workload is compute-bound or memory-bandwidth-bound.
Roofline Model
Visual performance model that shows achievable FLOP/s as a function of arithmetic intensity, with two ceilings: memory bandwidth and compute.
TPS / TPOT (Tokens per Second / Time per Output Token)
Output throughput metrics: TPS measures tokens generated per second, TPOT measures milliseconds between successive output tokens.