🗄️Ch 5beginner
KV Cache Sizing Calculator
Calculate KV cache memory for different models, sequence lengths, batch sizes, and precisions.
Model
4K tokens
1
KV Cache Precision
Total KV Cache
1.34 GB
1 batch × 4,096 tokens
Per Token
320.0 KB
across 80 layers
Per Request
1342.2 MB
4,096 tokens
Architecture Details
80 layers × 8 KV heads × 128 head dim
2 (K+V) × FP16 = 4096 bytes/token/layer