Translations:Stochastic Gradient Descent/27/en: Difference between revisions

Latest revision as of 19:42, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Stochastic Gradient Descent)

* '''Data shuffling''' — Re-shuffle the dataset each {{Term|epoch}} to avoid cyclic patterns.
* '''{{Term|gradient clipping|Gradient clipping}}''' — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
* '''{{Term|batch normalization|Batch normalisation}}''' — Normalising layer inputs reduces sensitivity to the {{Term|learning rate}}.
* '''Mixed-precision training''' — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.

Data shuffling — Re-shuffle the dataset each epoch to avoid cyclic patterns.
Gradient clipping — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
Batch normalisation — Normalising layer inputs reduces sensitivity to the learning rate.
Mixed-precision training — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.