Translations:Stochastic Gradient Descent/27/en

    From Marovi AI
    • Data shuffling — Re-shuffle the dataset each epoch to avoid cyclic patterns.
    • Gradient clipping — Cap the gradient norm to prevent exploding updates, especially in recurrent networks.
    • Batch normalisation — Normalising layer inputs reduces sensitivity to the learning rate.
    • Mixed-precision training — Using half-precision floats accelerates SGD on modern GPUs with minimal accuracy loss.