Arithmetic Intensity
The ratio of FLOPs to bytes of memory traffic for an operation, used to determine whether a workload is compute-bound or memory-bandwidth-bound.
Definition
Arithmetic intensity (measured in FLOPs/byte) characterises how computationally dense an operation is relative to the memory it must load. Operations with low arithmetic intensity (e.g., a batch-size-1 linear layer in LLM decode) require many bytes from HBM per useful FLOP and are memory-bandwidth-bound. Operations with high arithmetic intensity (e.g., prefill matmuls with large batch) are compute-bound. The roofline model plots achievable throughput as a function of arithmetic intensity, making it the standard tool for identifying inference bottlenecks.
Related
More Metrics terms
Roofline Model
Visual performance model that shows achievable FLOP/s as a function of arithmetic intensity, with two ceilings: memory bandwidth and compute.
TTFT (Time to First Token)
Latency from sending the request to receiving the first generated token — primarily determined by prefill duration and queuing time.
TPS / TPOT (Tokens per Second / Time per Output Token)
Output throughput metrics: TPS measures tokens generated per second, TPOT measures milliseconds between successive output tokens.