NVLink
NVIDIA's high-speed GPU-to-GPU interconnect, delivering up to 900 GB/s bidirectional bandwidth on H100 NVLink 4.0 for tensor and pipeline parallelism.
Definition
NVLink is NVIDIA's proprietary high-bandwidth interconnect for connecting multiple GPUs on the same node. NVLink 4.0 (H100 SXM) provides 900 GB/s of bidirectional bandwidth per GPU vs. PCIe 5.0's 128 GB/s, making all-reduce operations in tensor parallelism far more efficient. Up to 8 GPUs can be connected in a fully connected NVSwitch fabric within an HGX node, enabling 70+ TB/s of all-to-all bandwidth. NVLink is essential for large-model tensor parallelism beyond 2 GPUs; without it, all-reduce becomes a throughput bottleneck.
Related
More Hardware terms
HBM (High Bandwidth Memory)
3D-stacked DRAM technology used in data-centre GPUs, offering memory bandwidth 5–10× higher than GDDR at the cost of smaller capacity.
VRAM
Video RAM — the GPU's dedicated on-chip memory (HBM on datacenter GPUs) holding model weights, KV cache, and activations during inference.
Memory Bandwidth
The rate at which data can be read from or written to GPU memory, measured in TB/s — the primary bottleneck during autoregressive LLM decoding.