Skip to content
Metrics

TTFT (Time to First Token)

Latency from sending the request to receiving the first generated token — primarily determined by prefill duration and queuing time.

Definition

Time to First Token (TTFT) measures the delay a user experiences before any output appears. It encompasses network round-trip time, request queuing in the scheduler, and the prefill computation for the input prompt. TTFT is directly proportional to prompt length (longer prompts mean more prefill compute) and system load (longer queues mean more wait time). For interactive applications, low TTFT (under 200–500 ms) is critical for perceived responsiveness, and it is one of the primary SLO metrics in production LLM serving.

More Metrics terms