Translations:Adam A Method for Stochastic Optimization/7/en: Difference between revisions

Latest revision as of 21:37, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Adam A Method for Stochastic Optimization)

* '''Adam optimizer''': An adaptive {{Term|learning rate}} method that maintains per-parameter {{Term|learning rate|learning rates}} based on bias-corrected estimates of the first and second moments of the gradients.
* '''Bias correction''': A mechanism to counteract the initialization bias of the moment estimates toward zero, which is especially important in the initial steps of training.
* '''AdaMax variant''': A generalization based on the infinity norm that can sometimes outperform Adam on problems with sparse gradients.
* '''Practical defaults''': Recommended {{Term|hyperparameter}} values (<math>\beta_1 = 0.9</math>, <math>\beta_2 = 0.999</math>, <math>\epsilon = 10^{-8}</math>) that work well across a wide range of problems.

Adam optimizer: An adaptive learning rate method that maintains per-parameter learning rates based on bias-corrected estimates of the first and second moments of the gradients.
Bias correction: A mechanism to counteract the initialization bias of the moment estimates toward zero, which is especially important in the initial steps of training.
AdaMax variant: A generalization based on the infinity norm that can sometimes outperform Adam on problems with sparse gradients.
Practical defaults: Recommended hyperparameter values ( $\beta_1 = 0.9$ , $\beta_2 = 0.999$ , $\epsilon = 10^{-8}$ ) that work well across a wide range of problems.