Translations:Stochastic Gradient Descent/18/en

Constant — simple but may overshoot or stall.
Step decay — multiply $\eta$ by a factor (e.g. 0.1) every $$ k $$ epochs.
Exponential decay — $\eta_t = \eta_0 \, e^{-\lambda t}$ .
Cosine annealing — smoothly reduces the rate following a cosine curve, often with warm restarts.
Linear warm-up — ramp up from a small $\eta$ during the first few iterations to stabilise early training.