Skip to content
Optimization

GPTQ

Layer-wise weight quantization using second-order Hessian information to minimise quantization error, supporting 4-bit and 8-bit precisions.

Definition

GPTQ (Frantar et al., 2022) is a one-shot post-training quantization algorithm that applies a closed-form update to unquantized weights to compensate for the rounding errors introduced as each weight is quantised. It uses the inverse Hessian of the layer's squared error, estimated from a small calibration dataset, to determine the optimal correction. GPTQ achieves near-FP16 quality at INT4 for large models and is the basis for several popular quantized model families available on Hugging Face.

More Optimization terms