HBM (High Bandwidth Memory)
3D-stacked DRAM technology used in data-centre GPUs, offering memory bandwidth 5–10× higher than GDDR at the cost of smaller capacity.
Definition
High Bandwidth Memory (HBM) stacks multiple DRAM dies vertically using through-silicon vias (TSVs) and connects them directly to the GPU die via a wide interposer interface. The result is dramatically higher bandwidth (3.35 TB/s on H100 SXM) compared to GDDR6 (≈600 GB/s), at the cost of higher manufacturing complexity and lower maximum capacity per stack. Because LLM decoding is heavily memory-bandwidth bound, HBM bandwidth is the dominant determinant of decode throughput for a given model size.
Related
More Hardware terms
VRAM
Video RAM — the GPU's dedicated on-chip memory (HBM on datacenter GPUs) holding model weights, KV cache, and activations during inference.
Memory Bandwidth
The rate at which data can be read from or written to GPU memory, measured in TB/s — the primary bottleneck during autoregressive LLM decoding.
FLOPS
Floating-point operations per second — the peak compute throughput of a GPU, determining how fast compute-bound operations (like prefill) run.