Translations:Stochastic Gradient Descent/18/en

    From Marovi AI
    • Constant — simple but may overshoot or stall.
    • Step decay — multiply $ \eta $ by a factor (e.g. 0.1) every $ k $ epochs.
    • Exponential decay$ \eta_t = \eta_0 \, e^{-\lambda t} $.
    • Cosine annealing — smoothly reduces the rate following a cosine curve, often with warm restarts.
    • Linear warm-up — ramp up from a small $ \eta $ during the first few iterations to stabilise early training.