Skip to content
Site mission

About Inference Engineering

A free, interactive guide to running large language models in production — grounded in real benchmarks and the Baseten Books title by Philip Kiely.

Mission

Inference engineering — the discipline of deploying, optimising, and scaling LLM serving infrastructure — is one of the fastest- moving areas in machine learning. Yet the knowledge is scattered across research papers, vendor documentation, and hard-won production experience. This site exists to make that knowledge accessible in one place, free, and interactive.

The goal is not a broad survey of AI topics but a deep, practical treatment of a single question: how do you efficiently run a trained model at scale? That means covering GPU hardware, memory management, attention optimisations, quantization, serving frameworks, and the operational concerns that only surface in production.

What’s on this site

Why trust this

Own benchmarks

Performance claims are backed by data from benchmarks we ran ourselves — not vendor datasheets. The benchmark methodology, hardware specs, and raw numbers are published openly.

Based on the book

Content is derived from Inference Engineering, a Baseten Books title by Philip Kiely (2026). The book reflects production inference work at Baseten, where Philip spent years optimising LLM serving for real customers.

Interactive-first

Every concept that can be demonstrated numerically has a calculator or diagram. Passive reading is supplemented by exercises with immediate feedback rather than end-of-chapter answer keys.

Kept current

LLM inference moves fast. Pages are updated when the state of the art changes — not just when the static book ships a new edition.

Attribution

This site is the interactive web companion to Inference Engineering by Philip Kiely, published by Baseten Books (2026). The official book page is at baseten.co/inference-engineering.

Benchmark data is collected by the Inference Engineering team on dedicated H100 SXM hardware. Methodology and raw results are published on the benchmarks page.