Translations:Gradient Descent/33/en

    From Marovi AI
    • Feature scaling — normalising input features so they have similar ranges dramatically improves convergence, because the loss surface becomes more isotropic.
    • gradient clipping — capping the norm of the gradient prevents excessively large updates.
    • Random initialisation — starting from a reasonable random initialisation (e.g. Xavier or He initialisation for neural networks) avoids symmetry-breaking issues.
    • Monitoring the loss curve — plotting the training loss over iterations is the simplest diagnostic: a smoothly decreasing curve indicates healthy training; oscillations suggest the learning rate is too high.