Translations:ImageNet Classification with Deep CNNs/7/en
- Large-scale CNN training on GPUs: One of the first successful demonstrations of training deep convolutional networks on GPUs, using a model split across two NVIDIA GTX 580 GPUs with 3 GB of memory each.
- ReLU activation function: Adoption of rectified linear units ($ f(x) = \max(0, x) $) instead of the traditional sigmoid or tanh activations, enabling much faster training of deep networks.
- Data augmentation: Use of random image translations, horizontal reflections, and PCA-based color augmentation to artificially enlarge the training set and reduce overfitting.
- Dropout regularization: Application of dropout (with probability 0.5) in the fully connected layers, one of the earliest uses of this technique in a large convolutional network.
- Local response normalization: A normalization scheme inspired by lateral inhibition in biological neurons, applied after ReLU activations.
- Overlapping pooling: Use of max-pooling with stride smaller than the kernel size, which slightly reduced overfitting compared to non-overlapping pooling.