Expand description
Quantization — INT8/INT4 post-training quantization for inference.
Structs§
- Quant
Config - Full quantization configuration.
- Quant
Stats - Compute quantization statistics for a model.
- Quantized
Linear - A quantized linear layer that stores weights in INT8/INT4.
- Quantized
Tensor - A quantized tensor storing integer weights with associated scale/zero_point.
Enums§
- Quant
Bits - Bit-width for quantized values.
- Quant
Granularity - Granularity of quantization parameters (scale / zero_point).
- Quant
Mode - Quantization mode (symmetric vs. asymmetric).
Functions§
- dequantize_
tensor - Dequantize a
QuantizedTensorback to a float tensor. - quantization_
stats - Compute quantization statistics for a model without actually quantizing.
- quantize_
named_ parameters - Quantize all Linear layers in a model’s named_parameters.
- quantize_
tensor - Quantize a float tensor to a
QuantizedTensor.