Expand description
Re-export optimizers.
Modules§
Structs§
- Adam
- Adam optimizer (Adaptive Moment Estimation).
- AdamW
- AdamW optimizer (Adam with decoupled weight decay).
- Cosine
AnnealingLR - Cosine annealing from
initial_lrtomin_lrovertotal_steps. - Cosine
WarmupLR - Linear warmup from 0 to
initial_lroverwarmup_steps, then cosine decay frominitial_lrtomin_lrover the remaining steps. - EMA
- Exponential Moving Average of model parameters.
- ExponentialLR
- Multiply the learning rate by
gammaevery step. - Grad
Accumulator - Gradient accumulation helper.
- LinearLR
- Linearly interpolate the learning rate from
start_factor * initial_lrtoend_factor * initial_lrovertotal_stepssteps. - Optimizer
State - A serializable snapshot of an optimizer’s internal state.
- RAdam
- Rectified Adam (RAdam) optimizer.
- RMSProp
- RMSProp optimizer.
- ReduceLR
OnPlateau - Reduce the learning rate when a monitored metric plateaus.
- SGD
- Stochastic Gradient Descent optimizer with optional momentum.
- StepLR
- Multiply the learning rate by
gammaeverystep_sizesteps.
Traits§
- LrScheduler
- Trait for learning rate schedulers.
- Optimizer
- Trait that all optimizers implement.
- Stateful
- Trait for optimizers that can save and restore their internal state.
Functions§
- clip_
grad_ norm - Clip gradients by their global L2 norm.
- clip_
grad_ value - Clamp each gradient element to
[-max_value, max_value]. - grad_
norm - Compute the global L2 norm of all gradients without clipping.