Translations:Stochastic Gradient Descent/25/en

    From Marovi AI
    Revision as of 19:42, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
    Method Key idea Reference
    Momentum Accumulates an exponentially decaying moving average of past gradients Polyak, 1964
    Nesterov accelerated gradient Evaluates the gradient at a "look-ahead" position Nesterov, 1983
    adagrad Per-parameter rates that shrink for frequently updated features Duchi et al., 2011
    RMSProp Fixes adagrad's diminishing rates using a moving average of squared gradients Hinton (lecture notes), 2012
    Adam Combines momentum with RMSProp-style adaptive rates Kingma & Ba, 2015
    AdamW Decouples weight decay from the adaptive gradient step Loshchilov & Hutter, 2019