Translations:Gradient Descent/31/en
- momentum — accumulates a velocity vector from past gradients, helping to accelerate convergence in ravine-like landscapes.
- Nesterov accelerated gradient — a momentum variant that evaluates the gradient at a look-ahead position, yielding better theoretical convergence rates.
- Adaptive methods (adagrad, RMSProp, adam) — maintain per-parameter learning rates that adapt based on the history of gradients.
- Second-order methods — algorithms like Newton's method and L-BFGS use curvature information (the Hessian or its approximation) for faster convergence, but are often too expensive for large-scale problems.