Translations:Stochastic Gradient Descent/25/zh

    From Marovi AI
    Revision as of 03:38, 27 April 2026 by DeployBot (talk | contribs) (Batch translate Stochastic Gradient Descent unit 25 → zh)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
    方法 核心思想 文献
    Momentum 对历史梯度累积指数衰减的移动平均 Polyak, 1964
    Nesterov 加速梯度 在“前瞻”位置上计算梯度 Nesterov, 1983
    Adagrad 为每个参数设置学习率,对频繁更新的特征逐步减小 Duchi et al., 2011
    RMSProp 利用平方梯度的移动平均修正 Adagrad 学习率不断衰减的问题 Hinton(讲义),2012
    Adam momentum 与 RMSProp 风格的自适应学习率结合 Kingma 与 Ba, 2015
    AdamW 将权重衰减与自适应梯度更新解耦 Loshchilov 与 Hutter, 2019