Hardware

NVLink

NVIDIA's high-speed GPU-to-GPU interconnect, delivering up to 900 GB/s bidirectional bandwidth on H100 NVLink 4.0 for tensor and pipeline parallelism.

Definition

NVLink is NVIDIA's proprietary high-bandwidth interconnect for connecting multiple GPUs on the same node. NVLink 4.0 (H100 SXM) provides 900 GB/s of bidirectional bandwidth per GPU vs. PCIe 5.0's 128 GB/s, making all-reduce operations in tensor parallelism far more efficient. Up to 8 GPUs can be connected in a fully connected NVSwitch fabric within an HGX node, enabling 70+ TB/s of all-to-all bandwidth. NVLink is essential for large-model tensor parallelism beyond 2 GPUs; without it, all-reduce becomes a throughput bottleneck.

Tensor Parallelism Pipeline Parallelism Chapter 3: Hardware

More Hardware terms

HBM (High Bandwidth Memory)

3D-stacked DRAM technology used in data-centre GPUs, offering memory bandwidth 5–10× higher than GDDR at the cost of smaller capacity.

VRAM

Video RAM — the GPU's dedicated on-chip memory (HBM on datacenter GPUs) holding model weights, KV cache, and activations during inference.

Memory Bandwidth

The rate at which data can be read from or written to GPU memory, measured in TB/s — the primary bottleneck during autoregressive LLM decoding.

Back to Glossary Start Reading — Chapter 0