Module quantize

Module quantize

Expand description

Quantization — INT8/INT4 post-training quantization for inference.

Structs§

QuantConfig: Full quantization configuration.
QuantStats: Compute quantization statistics for a model.
QuantizedLinear: A quantized linear layer that stores weights in INT8/INT4.
QuantizedTensor: A quantized tensor storing integer weights with associated scale/zero_point.

Enums§

QuantBits: Bit-width for quantized values.
QuantGranularity: Granularity of quantization parameters (scale / zero_point).
QuantMode: Quantization mode (symmetric vs. asymmetric).

Functions§

dequantize_tensor: Dequantize a QuantizedTensor back to a float tensor.
quantization_stats: Compute quantization statistics for a model without actually quantizing.
quantize_named_parameters: Quantize all Linear layers in a model’s named_parameters.
quantize_tensor: Quantize a float tensor to a QuantizedTensor.