All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	Training deep neural networks requires minimizing a high-dimensional, non-convex {{Term\|loss function\|objective function}} using stochastic gradient estimates. Standard {{Term\|stochastic gradient descent}} ({{Term\|stochastic gradient descent\|SGD}}) uses a single global {{Term\|learning rate}} for all parameters, which can be suboptimal when different parameters have gradients of very different magnitudes or when the loss surface has highly anisotropic curvature.
^h Spanish (es)	Entrenar redes neuronales profundas requiere minimizar una {{Term\|loss function\|función objetivo}} no convexa y de alta dimensión utilizando estimaciones estocásticas del gradiente. El {{Term\|stochastic gradient descent\|descenso de gradiente estocástico}} estándar ({{Term\|stochastic gradient descent\|SGD}}) emplea una única {{Term\|learning rate\|tasa de aprendizaje}} global para todos los parámetros, lo que puede resultar subóptimo cuando distintos parámetros presentan gradientes de magnitudes muy diferentes o cuando la superficie de pérdida tiene una curvatura altamente anisotrópica.
^h Chinese (zh)	训练深度神经网络需要使用随机梯度估计来最小化高维非凸 {{Term\|loss function\|目标函数}}。标准的 {{Term\|stochastic gradient descent\|随机梯度下降}}（{{Term\|stochastic gradient descent\|SGD}}）对所有参数使用单一的全局 {{Term\|learning rate\|学习率}}，当不同参数的梯度量级差异很大，或损失曲面具有高度各向异性的曲率时，这种做法可能并非最优。