Translations:Gradient Descent/27/en
- Too large — the iterates oscillate or diverge.
- Too small — convergence is unacceptably slow.
- learning rate schedules — many practitioners start with a larger rate and reduce it over time (step decay, exponential decay, cosine annealing).
- Line search — classical numerical methods choose $ \eta $ at each step to satisfy conditions such as the Wolfe or Armijo conditions, though this is rare in deep learning.