Translations:Gradient Descent/27/en

Too large — the iterates oscillate or diverge.
Too small — convergence is unacceptably slow.
learning rate schedules — many practitioners start with a larger rate and reduce it over time (step decay, exponential decay, cosine annealing).
Line search — classical numerical methods choose $\eta$ at each step to satisfy conditions such as the Wolfe or Armijo conditions, though this is rare in deep learning.