Translations:Stochastic Gradient Descent/25/en: Difference between revisions

    From Marovi AI
    (Importing a new version from external source)
     
    (Importing a new version from external source)
     
    Line 7: Line 7:
    | '''Nesterov accelerated gradient''' || Evaluates the gradient at a "look-ahead" position || Nesterov, 1983
    | '''Nesterov accelerated gradient''' || Evaluates the gradient at a "look-ahead" position || Nesterov, 1983
    |-
    |-
    | '''Adagrad''' || Per-parameter rates that shrink for frequently updated features || Duchi et al., 2011
    | '''{{Term|adagrad}}''' || Per-parameter rates that shrink for frequently updated features || Duchi et al., 2011
    |-
    |-
    | '''RMSProp''' || Fixes Adagrad's diminishing rates using a moving average of squared gradients || Hinton (lecture notes), 2012
    | '''RMSProp''' || Fixes {{Term|adagrad}}'s diminishing rates using a moving average of squared gradients || Hinton (lecture notes), 2012
    |-
    |-
    | '''{{Term|Adam}}''' || Combines {{Term|momentum}} with RMSProp-style adaptive rates || Kingma & Ba, 2015
    | '''{{Term|Adam}}''' || Combines {{Term|momentum}} with RMSProp-style adaptive rates || Kingma & Ba, 2015
    |-
    |-
    | '''AdamW''' || Decouples weight decay from the adaptive gradient step || Loshchilov & Hutter, 2019
    | '''AdamW''' || Decouples {{Term|weight decay}} from the adaptive gradient step || Loshchilov & Hutter, 2019
    |}
    |}

    Latest revision as of 19:42, 27 April 2026

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Stochastic Gradient Descent)
    {| class="wikitable"
    |-
    ! Method !! Key idea !! Reference
    |-
    | '''{{Term|momentum|Momentum}}''' || Accumulates an exponentially decaying moving average of past gradients || Polyak, 1964
    |-
    | '''Nesterov accelerated gradient''' || Evaluates the gradient at a "look-ahead" position || Nesterov, 1983
    |-
    | '''{{Term|adagrad}}''' || Per-parameter rates that shrink for frequently updated features || Duchi et al., 2011
    |-
    | '''RMSProp''' || Fixes {{Term|adagrad}}'s diminishing rates using a moving average of squared gradients || Hinton (lecture notes), 2012
    |-
    | '''{{Term|Adam}}''' || Combines {{Term|momentum}} with RMSProp-style adaptive rates || Kingma & Ba, 2015
    |-
    | '''AdamW''' || Decouples {{Term|weight decay}} from the adaptive gradient step || Loshchilov & Hutter, 2019
    |}
    Method Key idea Reference
    Momentum Accumulates an exponentially decaying moving average of past gradients Polyak, 1964
    Nesterov accelerated gradient Evaluates the gradient at a "look-ahead" position Nesterov, 1983
    adagrad Per-parameter rates that shrink for frequently updated features Duchi et al., 2011
    RMSProp Fixes adagrad's diminishing rates using a moving average of squared gradients Hinton (lecture notes), 2012
    Adam Combines momentum with RMSProp-style adaptive rates Kingma & Ba, 2015
    AdamW Decouples weight decay from the adaptive gradient step Loshchilov & Hutter, 2019