Skip to content
Exercises/Speculative Decoding Simulator
Ch 5intermediate

Speculative Decoding Simulator

Simulate draft-verify cycles: adjust draft length, acceptance rate, and overhead to see effective TPS improvement.

60 tok/s
4 tokens
70 %
15 %

Base TPS

60 tok/s

Without speculation

Effective TPS

145 tok/s

With speculation

Speedup

2.41x

2.8 tokens per forward pass (1.8 accepted + 1 generated)

Forward Pass Time

19.2 ms

Base 16.7ms + 15% overhead

Draft Token Acceptance (avg)

70%

T1

49%

T2

34%

T3

24%

T4

100%

+1