Module distributed

Module distributed 

Source
Expand description

Distributed training — DataParallel, MixedPrecision, Pipeline, gradient sync.

Structs§

DataParallel
Wraps a Module and splits each input batch across num_workers threads.
LossScaleConfig
Configuration for dynamic loss scaling in mixed-precision training.
MixedPrecisionMetrics
Metrics from a single mixed-precision training step.
MixedPrecisionTrainer
Mixed-precision training: reduced-precision forward/backward with FP32 master weights.
ParallelTrainer
High-level training loop with gradient accumulation.
PipelineParallel
Pipeline-parallel executor using GPipe-style micro-batching.
PipelineStage
A stage in a pipeline-parallel model.

Enums§

AllReduceOp
Strategy for combining gradients from multiple replicas.

Functions§

reduce_gradients
Average (or sum) multiple GradStores into a single GradStore.