All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	{\| class="wikitable" \|- ! Method !! Key idea !! Reference \|- \| '''{{Term\|momentum\|Momentum}}''' \|\| Accumulates an exponentially decaying moving average of past gradients \|\| Polyak, 1964 \|- \| '''Nesterov accelerated gradient''' \|\| Evaluates the gradient at a "look-ahead" position \|\| Nesterov, 1983 \|- \| '''{{Term\|adagrad}}''' \|\| Per-parameter rates that shrink for frequently updated features \|\| Duchi et al., 2011 \|- \| '''RMSProp''' \|\| Fixes {{Term\|adagrad}}'s diminishing rates using a moving average of squared gradients \|\| Hinton (lecture notes), 2012 \|- \| '''{{Term\|Adam}}''' \|\| Combines {{Term\|momentum}} with RMSProp-style adaptive rates \|\| Kingma & Ba, 2015 \|- \| '''AdamW''' \|\| Decouples {{Term\|weight decay}} from the adaptive gradient step \|\| Loshchilov & Hutter, 2019 \|}
^h Spanish (es)	{\| class="wikitable" \|- ! Método !! Idea principal !! Referencia \|- \| '''{{Term\|momentum\|Momentum}}''' \|\| Acumula un promedio móvil con decaimiento exponencial de los gradientes pasados \|\| Polyak, 1964 \|- \| '''Gradiente acelerado de Nesterov''' \|\| Evalúa el gradiente en una posición de "anticipación" \|\| Nesterov, 1983 \|- \| '''Adagrad''' \|\| Tasas por parámetro que disminuyen para características que se actualizan con frecuencia \|\| Duchi et al., 2011 \|- \| '''RMSProp''' \|\| Corrige las tasas decrecientes de Adagrad usando un promedio móvil de gradientes al cuadrado \|\| Hinton (notas de clase), 2012 \|- \| '''{{Term\|Adam}}''' \|\| Combina {{Term\|momentum}} con tasas adaptativas al estilo de RMSProp \|\| Kingma y Ba, 2015 \|- \| '''AdamW''' \|\| Desacopla la regularización por decaimiento de pesos del paso de gradiente adaptativo \|\| Loshchilov y Hutter, 2019 \|}
^h Chinese (zh)	{\| class="wikitable" \|- ! 方法 !! 核心思想 !! 文献 \|- \| '''{{Term\|momentum\|Momentum}}''' \|\| 对历史梯度累积指数衰减的移动平均 \|\| Polyak, 1964 \|- \| '''Nesterov 加速梯度''' \|\| 在“前瞻”位置上计算梯度 \|\| Nesterov, 1983 \|- \| '''Adagrad''' \|\| 为每个参数设置学习率，对频繁更新的特征逐步减小 \|\| Duchi et al., 2011 \|- \| '''RMSProp''' \|\| 利用平方梯度的移动平均修正 Adagrad 学习率不断衰减的问题 \|\| Hinton（讲义），2012 \|- \| '''{{Term\|Adam}}''' \|\| 将 {{Term\|momentum}} 与 RMSProp 风格的自适应学习率结合 \|\| Kingma 与 Ba, 2015 \|- \| '''AdamW''' \|\| 将权重衰减与自适应梯度更新解耦 \|\| Loshchilov 与 Hutter, 2019 \|}