Translations:Stochastic Gradient Descent/25/en

    From Marovi AI
    Method Key idea Reference
    Momentum Accumulates an exponentially decaying moving average of past gradients Polyak, 1964
    Nesterov accelerated gradient Evaluates the gradient at a "look-ahead" position Nesterov, 1983
    Adagrad Per-parameter rates that shrink for frequently updated features Duchi et al., 2011
    RMSProp Fixes Adagrad's diminishing rates using a moving average of squared gradients Hinton (lecture notes), 2012
    Adam Combines momentum with RMSProp-style adaptive rates Kingma & Ba, 2015
    AdamW Decouples weight decay from the adaptive gradient step Loshchilov & Hutter, 2019