Inference is the most valuable category in the AI industry.
On November 30, 2022 — the day ChatGPT launched — there were perhaps a few hundred inference engineers in the world, though they didn't call themselves that at the time. These specialists mostly worked at frontier labs like OpenAI, Midjourney, and Anthropic or big tech companies like Google and NVIDIA.
Three years later, a Cambrian explosion of open models — more than two million and counting on Hugging Face — means that every engineer can now deploy their own intelligence to power their AI products.
Open models have publicly available weights (e.g., Llama, DeepSeek). Closed models keep weights proprietary (e.g., GPT-5, Claude). Until December 2024, there was a meaningful gap in intelligence — when DeepSeek V3 and R1 were released, that gap disappeared.
Why Open Models Matter#
Even if open models are constantly chasing closed models on benchmarks, they change the equation for AI product builders. Switching to open models unlocks the opportunity to use inference engineering to make AI products better in new dimensions:
- Latency: Closed model APIs are built for throughput, but open models can be optimized for real-time applications
- Availability: While APIs for GPT and Claude are stuck at two nines of uptime, four nines or better is possible with dedicated deployments
- Cost: Open models are often at least 80 percent less expensive at scale
Who Needs Inference Engineering#
AI-native startups like Cursor, Clay, Gamma, and Mercor are redefining hypergrowth building products that rely on open and in-house models. Leading digital native companies like Notion and Superhuman are thriving by deeply integrating AI capabilities.
A new generation of blended research and engineering teams — World Labs, Writer, Mirage, and dozens of others — are building enormous businesses by training and productizing their own foundation models.
You are early. While the potential and impact of inference are becoming clear, the space is young. There are relatively few people working on inference, and newcomers can become experts quickly. There are enormous opportunities to solve novel, interesting, and deeply technical problems at all levels of the stack.
Key Takeaway
Inference Engineering is your guide to becoming an expert in inference — from GPU hardware and CUDA kernels to production autoscaling. Welcome to the early days of inference.