Translations:Stochastic Gradient Descent/27/en: Difference between revisions

    From Marovi AI
    (Importing a new version from external source)
     
    (Importing a new version from external source)
     
    Line 1: Line 1:
    * '''Data shuffling''' — Re-shuffle the dataset each epoch to avoid cyclic patterns.
    * '''Data shuffling''' — Re-shuffle the dataset each {{Term|epoch}} to avoid cyclic patterns.
    * '''{{Term|gradient clipping|Gradient clipping}}''' — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
    * '''{{Term|gradient clipping|Gradient clipping}}''' — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
    * '''{{Term|batch normalization|Batch normalisation}}''' — Normalising layer inputs reduces sensitivity to the {{Term|learning rate}}.
    * '''{{Term|batch normalization|Batch normalisation}}''' — Normalising layer inputs reduces sensitivity to the {{Term|learning rate}}.
    * '''Mixed-precision training''' — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.
    * '''Mixed-precision training''' — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.

    Latest revision as of 19:42, 27 April 2026

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Stochastic Gradient Descent)
    * '''Data shuffling''' — Re-shuffle the dataset each {{Term|epoch}} to avoid cyclic patterns.
    * '''{{Term|gradient clipping|Gradient clipping}}''' — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
    * '''{{Term|batch normalization|Batch normalisation}}''' — Normalising layer inputs reduces sensitivity to the {{Term|learning rate}}.
    * '''Mixed-precision training''' — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.
    • Data shuffling — Re-shuffle the dataset each epoch to avoid cyclic patterns.
    • Gradient clipping — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
    • Batch normalisation — Normalising layer inputs reduces sensitivity to the learning rate.
    • Mixed-precision training — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.