Translations:Stochastic Gradient Descent/3/zh

    From Marovi AI
    Revision as of 03:38, 27 April 2026 by DeployBot (talk | contribs) (Batch translate Stochastic Gradient Descent unit 3 → zh)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

    在经典的梯度下降中,每次参数更新前都要在整个训练集上计算损失函数的完整梯度。当数据集很大时,这种做法的代价高得难以承受。SGD 通过在每一步从单个随机选取的样本(或一个小的 mini-batch)估计梯度来解决该问题,以较高噪声的估计换取每次迭代成本的大幅降低。