Translations:Batch Normalization Accelerating Deep Network Training/4/en
Training deep neural networks is complicated by the fact that each layer's input distribution changes during training as the parameters of all preceding layers are updated. This phenomenon, which the authors called internal covariate shift, forces the use of lower learning rates and careful parameter initialization, slowing training considerably.