All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	* '''Adam optimizer''': An adaptive {{Term\|learning rate}} method that maintains per-parameter {{Term\|learning rate\|learning rates}} based on bias-corrected estimates of the first and second moments of the gradients. * '''Bias correction''': A mechanism to counteract the initialization bias of the moment estimates toward zero, which is especially important in the initial steps of training. * '''AdaMax variant''': A generalization based on the infinity norm that can sometimes outperform Adam on problems with sparse gradients. * '''Practical defaults''': Recommended {{Term\|hyperparameter}} values (<math>\beta_1 = 0.9</math>, <math>\beta_2 = 0.999</math>, <math>\epsilon = 10^{-8}</math>) that work well across a wide range of problems.
^h Spanish (es)	* '''Optimizador Adam''': Un método con {{Term\|learning rate\|tasa de aprendizaje}} adaptativa que mantiene {{Term\|learning rate\|tasas de aprendizaje}} por parámetro basadas en estimaciones corregidas por sesgo del primer y segundo momentos de los gradientes. * '''Corrección de sesgo''': Un mecanismo para contrarrestar el sesgo de inicialización de las estimaciones de momento hacia cero, especialmente importante en los pasos iniciales del entrenamiento. * '''Variante AdaMax''': Una generalización basada en la norma infinito que en ocasiones puede superar a Adam en problemas con gradientes dispersos. * '''Valores por defecto prácticos''': Valores recomendados de los {{Term\|hyperparameter\|hiperparámetros}} (<math>\beta_1 = 0.9</math>, <math>\beta_2 = 0.999</math>, <math>\epsilon = 10^{-8}</math>) que funcionan bien en una amplia variedad de problemas.
^h Chinese (zh)	* '''Adam 优化器'''：一种自适应 {{Term\|learning rate\|学习率}} 方法，基于对梯度一阶矩和二阶矩的偏差校正估计，为每个参数维护各自的 {{Term\|learning rate\|学习率}}。 * '''偏差校正'''：一种用于抵消矩估计在初始化时偏向零的偏差的机制，在训练的最初几步尤为重要。 * '''AdaMax 变体'''：基于无穷范数的一种推广，在稀疏梯度问题上有时能够优于 Adam。 * '''实用的默认值'''：推荐的 {{Term\|hyperparameter\|超参数}}取值（<math>\beta_1 = 0.9</math>、<math>\beta_2 = 0.999</math>、<math>\epsilon = 10^{-8}</math>），在各种问题中都能良好工作。