All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	In classical {{Term\|gradient descent}}, the full gradient of the {{Term\|loss function}} is computed over the entire training set before each parameter update. When the dataset is large this becomes prohibitively expensive. SGD addresses the problem by estimating the gradient from a single randomly chosen sample (or a small '''{{Term\|mini-batch}}''') at each step, trading a noisier estimate for dramatically lower per-iteration cost.
^h Spanish (es)	En el {{Term\|gradient descent\|descenso de gradiente}} clásico, el gradiente completo de la {{Term\|loss function\|función de pérdida}} se calcula sobre todo el conjunto de entrenamiento antes de cada actualización de parámetros. Cuando el conjunto de datos es grande, esto se vuelve prohibitivamente costoso. SGD aborda el problema estimando el gradiente a partir de una única muestra elegida al azar (o un pequeño '''{{Term\|mini-batch}}''') en cada paso, intercambiando una estimación más ruidosa por un costo por iteración drásticamente menor.
^h Chinese (zh)	在经典的{{Term\|gradient descent\|梯度下降}}中，每次参数更新前都要在整个训练集上计算{{Term\|loss function\|损失函数}}的完整梯度。当数据集很大时，这种做法的代价高得难以承受。SGD 通过在每一步从单个随机选取的样本（或一个小的 '''{{Term\|mini-batch}}'''）估计梯度来解决该问题，以较高噪声的估计换取每次迭代成本的大幅降低。