Translations:ImageNet Classification with Deep CNNs/7/en

Large-scale CNN training on GPUs: One of the first successful demonstrations of training deep convolutional networks on GPUs, using a model split across two NVIDIA GTX 580 GPUs with 3 GB of memory each.
ReLU activation function: Adoption of rectified linear units ( $f(x) = \max(0, x)$ ) instead of the traditional sigmoid or tanh activations, enabling much faster training of deep networks.
Data augmentation: Use of random image translations, horizontal reflections, and PCA-based color augmentation to artificially enlarge the training set and reduce overfitting.
Dropout regularization: Application of dropout (with probability 0.5) in the fully connected layers, one of the earliest uses of this technique in a large convolutional network.
Local response normalization: A normalization scheme inspired by lateral inhibition in biological neurons, applied after ReLU activations.
Overlapping pooling: Use of max-pooling with stride smaller than the kernel size, which slightly reduced overfitting compared to non-overlapping pooling.