Skip to content
Software

Text Generation Inference (TGI)

Hugging Face's production LLM serving toolkit with continuous batching, tensor parallelism, and a Rust-based HTTP server.

Definition

Text Generation Inference (TGI) is Hugging Face's inference server for large language models. Built with a Rust HTTP server front-end and Python/C++ backend, TGI supports continuous batching, tensor parallelism, flash attention, and quantization (GPTQ, AWQ, bitsandbytes). It integrates tightly with the Hugging Face Hub for model downloading and authentication, and exposes the Messages API (OpenAI-compatible) by default. TGI is widely used in Hugging Face Inference Endpoints and is available as a Docker image for self-hosting.

More Software terms