Interactive Exercises
Hands-on calculators to build intuition for inference engineering. Adjust parameters and see results in real time.
VRAM Calculator
Calculate memory requirements for model inference: weights, KV cache, activations, and overhead vs GPU capacity.
Arithmetic Intensity Calculator
Compute the ops:byte ratio and determine whether your workload is compute-bound or memory-bound with a roofline plot.
KV Cache Sizing Calculator
Calculate KV cache memory for different models, sequence lengths, batch sizes, and precisions.
Quantization Quality Estimator
Explore precision tradeoffs: memory savings, speedup estimates, and quality risk across quantization strategies.
Speculative Decoding Simulator
Simulate draft-verify cycles: adjust draft length, acceptance rate, and overhead to see effective TPS improvement.
GPU Selection Advisor
Answer questions about your model, priorities, and budget to get ranked GPU recommendations with reasoning.
Model-to-Hardware Recommender
Input your model and performance requirements to get ranked GPU configuration recommendations with reasoning.