Quantization
Q
Quantization
Definition
A model optimization technique that reduces the precision of numerical values used in neural network computations, typically from 32-bit floating point to 16-bit, 8-bit, or even 4-bit representations. Quantization dramatically reduces model size and inference time with minimal accuracy loss.