Expand description
Distributed training — DataParallel, MixedPrecision, Pipeline, gradient sync.
Structs§
- Data
Parallel - Wraps a
Moduleand splits each input batch acrossnum_workersthreads. - Loss
Scale Config - Configuration for dynamic loss scaling in mixed-precision training.
- Mixed
Precision Metrics - Metrics from a single mixed-precision training step.
- Mixed
Precision Trainer - Mixed-precision training: reduced-precision forward/backward with FP32 master weights.
- Parallel
Trainer - High-level training loop with gradient accumulation.
- Pipeline
Parallel - Pipeline-parallel executor using GPipe-style micro-batching.
- Pipeline
Stage - A stage in a pipeline-parallel model.
Enums§
- AllReduce
Op - Strategy for combining gradients from multiple replicas.
Functions§
- reduce_
gradients - Average (or sum) multiple
GradStores into a singleGradStore.