Translations:Adam A Method for Stochastic Optimization/2/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
'''Adam: A Method for Stochastic Optimization''' is a 2015 paper by Kingma and Ba that introduced the '''Adam''' optimizer, an algorithm for first-order gradient-based optimization of stochastic objective functions. Adam combines the advantages of two earlier methods — ''' | '''Adam: A Method for Stochastic Optimization''' is a 2015 paper by Kingma and Ba that introduced the '''Adam''' optimizer, an algorithm for first-order gradient-based optimization of stochastic {{Term|loss function|objective functions}}. Adam combines the advantages of two earlier methods — '''{{Term|adagrad}}''' (which adapts {{Term|learning rate|learning rates}} per parameter) and '''RMSProp''' (which uses a running average of squared gradients) — into a single algorithm with bias-corrected moment estimates. Adam has become the default optimizer for training neural networks across most domains. | ||
Latest revision as of 21:37, 27 April 2026
Adam: A Method for Stochastic Optimization is a 2015 paper by Kingma and Ba that introduced the Adam optimizer, an algorithm for first-order gradient-based optimization of stochastic objective functions. Adam combines the advantages of two earlier methods — adagrad (which adapts learning rates per parameter) and RMSProp (which uses a running average of squared gradients) — into a single algorithm with bias-corrected moment estimates. Adam has become the default optimizer for training neural networks across most domains.