⚡Ch 5intermediate
Speculative Decoding Simulator
Simulate draft-verify cycles: adjust draft length, acceptance rate, and overhead to see effective TPS improvement.
60 tok/s
4 tokens
70 %
15 %
Base TPS
60 tok/s
Without speculation
Effective TPS
145 tok/s
With speculation
Speedup
2.41x
2.8 tokens per forward pass (1.8 accepted + 1 generated)
Forward Pass Time
19.2 ms
Base 16.7ms + 15% overhead
Draft Token Acceptance (avg)
70%
T1
49%
T2
34%
T3
24%
T4
100%
+1