Translations:Batch Normalization Accelerating Deep Network Training/17/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
The authors also observed that batch normalization reduces the dependence on precise initialization, permits higher learning rates without divergence, and provides a mild regularization effect because each sample's normalized value depends on the other samples in its mini-batch, introducing stochastic noise. | The authors also observed that {{Term|batch normalization}} reduces the dependence on precise initialization, permits higher {{Term|learning rate|learning rates}} without divergence, and provides a mild {{Term|regularization}} effect because each sample's normalized value depends on the other samples in its {{Term|mini-batch}}, introducing stochastic noise. | ||
Latest revision as of 21:40, 27 April 2026
The authors also observed that batch normalization reduces the dependence on precise initialization, permits higher learning rates without divergence, and provides a mild regularization effect because each sample's normalized value depends on the other samples in its mini-batch, introducing stochastic noise.