FLOPS
Floating-point operations per second — the peak compute throughput of a GPU, determining how fast compute-bound operations (like prefill) run.
Definition
FLOPS (or FLOP/s) measures the number of floating-point arithmetic operations a chip can execute per second. GPU FLOPS scales with clock speed, core count, and precision; NVIDIA H100 delivers 989 TFLOPS at BF16 (Tensor Core peak). Prefill — which processes the entire prompt in parallel — is a compute-bound operation: its runtime scales with prompt length × model parameter count in FLOPS, and throughput is limited by FLOPS until batch sizes are large enough. The ratio of FLOPS to memory bandwidth (arithmetic intensity) determines whether a workload is compute-bound or memory-bound.
Related
More Hardware terms
HBM (High Bandwidth Memory)
3D-stacked DRAM technology used in data-centre GPUs, offering memory bandwidth 5–10× higher than GDDR at the cost of smaller capacity.
VRAM
Video RAM — the GPU's dedicated on-chip memory (HBM on datacenter GPUs) holding model weights, KV cache, and activations during inference.
Memory Bandwidth
The rate at which data can be read from or written to GPU memory, measured in TB/s — the primary bottleneck during autoregressive LLM decoding.