Skip to content
Hardware

FLOPS

Floating-point operations per second — the peak compute throughput of a GPU, determining how fast compute-bound operations (like prefill) run.

Definition

FLOPS (or FLOP/s) measures the number of floating-point arithmetic operations a chip can execute per second. GPU FLOPS scales with clock speed, core count, and precision; NVIDIA H100 delivers 989 TFLOPS at BF16 (Tensor Core peak). Prefill — which processes the entire prompt in parallel — is a compute-bound operation: its runtime scales with prompt length × model parameter count in FLOPS, and throughput is limited by FLOPS until batch sizes are large enough. The ratio of FLOPS to memory bandwidth (arithmetic intensity) determines whether a workload is compute-bound or memory-bound.

More Hardware terms