Translations:Gradient Descent/27/en

    From Marovi AI
    • Too large — the iterates oscillate or diverge.
    • Too smallconvergence is unacceptably slow.
    • learning rate schedules — many practitioners start with a larger rate and reduce it over time (step decay, exponential decay, cosine annealing).
    • Line search — classical numerical methods choose $ \eta $ at each step to satisfy conditions such as the Wolfe or Armijo conditions, though this is rare in deep learning.