Translations:Stochastic Gradient Descent/25/en: Difference between revisions

Latest revision as of 19:42, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Stochastic Gradient Descent)

{| class="wikitable"
|-
! Method !! Key idea !! Reference
|-
| '''{{Term|momentum|Momentum}}''' || Accumulates an exponentially decaying moving average of past gradients || Polyak, 1964
|-
| '''Nesterov accelerated gradient''' || Evaluates the gradient at a "look-ahead" position || Nesterov, 1983
|-
| '''{{Term|adagrad}}''' || Per-parameter rates that shrink for frequently updated features || Duchi et al., 2011
|-
| '''RMSProp''' || Fixes {{Term|adagrad}}'s diminishing rates using a moving average of squared gradients || Hinton (lecture notes), 2012
|-
| '''{{Term|Adam}}''' || Combines {{Term|momentum}} with RMSProp-style adaptive rates || Kingma & Ba, 2015
|-
| '''AdamW''' || Decouples {{Term|weight decay}} from the adaptive gradient step || Loshchilov & Hutter, 2019
|}

Method	Key idea	Reference
Momentum	Accumulates an exponentially decaying moving average of past gradients	Polyak, 1964
Nesterov accelerated gradient	Evaluates the gradient at a "look-ahead" position	Nesterov, 1983
adagrad	Per-parameter rates that shrink for frequently updated features	Duchi et al., 2011
RMSProp	Fixes adagrad's diminishing rates using a moving average of squared gradients	Hinton (lecture notes), 2012
Adam	Combines momentum with RMSProp-style adaptive rates	Kingma & Ba, 2015
AdamW	Decouples weight decay from the adaptive gradient step	Loshchilov & Hutter, 2019