Module quantize

Module quantize 

Source
Expand description

Quantization — INT8/INT4 post-training quantization for inference.

Structs§

QuantConfig
Full quantization configuration.
QuantStats
Compute quantization statistics for a model.
QuantizedLinear
A quantized linear layer that stores weights in INT8/INT4.
QuantizedTensor
A quantized tensor storing integer weights with associated scale/zero_point.

Enums§

QuantBits
Bit-width for quantized values.
QuantGranularity
Granularity of quantization parameters (scale / zero_point).
QuantMode
Quantization mode (symmetric vs. asymmetric).

Functions§

dequantize_tensor
Dequantize a QuantizedTensor back to a float tensor.
quantization_stats
Compute quantization statistics for a model without actually quantizing.
quantize_named_parameters
Quantize all Linear layers in a model’s named_parameters.
quantize_tensor
Quantize a float tensor to a QuantizedTensor.