Translations:Stochastic Gradient Descent/27/en

    From Marovi AI
    Revision as of 19:42, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
    • Data shuffling — Re-shuffle the dataset each epoch to avoid cyclic patterns.
    • Gradient clipping — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
    • Batch normalisation — Normalising layer inputs reduces sensitivity to the learning rate.
    • Mixed-precision training — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.