Tensor Core
Specialised matrix-multiply-accumulate units in NVIDIA GPUs that execute fused D=A×B+C on tiles at up to 16× the throughput of CUDA cores.
Definition
Tensor Cores are NVIDIA's specialised matrix-multiply hardware, introduced in Volta (V100) and refined in Ampere (A100), Hopper (H100), and Blackwell (B200). They execute a fused D = A × B + C matrix operation on small tiles (e.g., 16×16 or 8×16×16) in a single clock cycle using reduced-precision inputs (FP16, BF16, FP8, INT8, INT4). The key to reaching peak FLOPS is keeping Tensor Cores fed with data; FlashAttention and efficient kernel tiling strategies are designed specifically to maximise Tensor Core utilisation.
Related
More Hardware terms
HBM (High Bandwidth Memory)
3D-stacked DRAM technology used in data-centre GPUs, offering memory bandwidth 5–10× higher than GDDR at the cost of smaller capacity.
VRAM
Video RAM — the GPU's dedicated on-chip memory (HBM on datacenter GPUs) holding model weights, KV cache, and activations during inference.
Memory Bandwidth
The rate at which data can be read from or written to GPU memory, measured in TB/s — the primary bottleneck during autoregressive LLM decoding.