Translations:ImageNet Classification with Deep CNNs/15/en

The network was trained using stochastic gradient descent with a batch size of 128, momentum of 0.9, and weight decay of 0.0005. The learning rate was initialized at 0.01 and manually reduced by a factor of 10 when the validation error stopped improving. Training took approximately five to six days on two NVIDIA GTX 580 GPUs.