Translations:Gradient Descent/17/en
Full batch gradient descent computes the exact gradient and therefore follows a smooth trajectory toward the minimum. Stochastic gradient descent uses a single sample to estimate the gradient, drastically reducing computation per step at the cost of a noisier trajectory. mini-batch gradient descent strikes a balance and is the most common choice in practice, with typical batch sizes between 32 and 512.