Translations:Adam A Method for Stochastic Optimization/5/en: Difference between revisions

Latest revision as of 21:37, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Adam A Method for Stochastic Optimization)

Prior adaptive methods like {{Term|adagrad}} accumulated squared gradients over the entire training run, causing {{Term|learning rate|learning rates}} to decay monotonically to zero — problematic for non-convex problems. RMSProp addressed this by using an exponential moving average, but lacked bias correction. Adam unified these ideas with bias-corrected estimates of both the first moment (mean) and second moment (uncentered variance) of the gradients, providing an effective and computationally efficient optimizer with well-behaved default {{Term|hyperparameter|hyperparameters}}.

Prior adaptive methods like adagrad accumulated squared gradients over the entire training run, causing learning rates to decay monotonically to zero — problematic for non-convex problems. RMSProp addressed this by using an exponential moving average, but lacked bias correction. Adam unified these ideas with bias-corrected estimates of both the first moment (mean) and second moment (uncentered variance) of the gradients, providing an effective and computationally efficient optimizer with well-behaved default hyperparameters.

Revision as of 00:31, 27 April 2026 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)		Latest revision as of 21:37, 27 April 2026 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)
Line 1:		Line 1:
	Prior adaptive methods like ~~AdaGrad~~ accumulated squared gradients over the entire training run, causing learning rates to decay monotonically to zero — problematic for non-convex problems. RMSProp addressed this by using an exponential moving average, but lacked bias correction. Adam unified these ideas with bias-corrected estimates of both the first moment (mean) and second moment (uncentered variance) of the gradients, providing an effective and computationally efficient optimizer with well-behaved default hyperparameters.		Prior adaptive methods like {{Term\|adagrad}} accumulated squared gradients over the entire training run, causing {{Term\|learning rate\|learning rates}} to decay monotonically to zero — problematic for non-convex problems. RMSProp addressed this by using an exponential moving average, but lacked bias correction. Adam unified these ideas with bias-corrected estimates of both the first moment (mean) and second moment (uncentered variance) of the gradients, providing an effective and computationally efficient optimizer with well-behaved default {{Term\|hyperparameter\|hyperparameters}}.