Translations:Convolutional Neural Networks/31/en

Use pretrained models (transfer learning) when labelled data is limited.
Prefer small kernels ( $3 \times 3$ ) stacked in depth — two $3 \times 3$ layers have the same receptive field as one $5 \times 5$ layer but with fewer parameters.
Apply batch normalisation after convolution and before activation.
Use data augmentation generously to reduce overfitting.
Replace fully connected layers with global average pooling to reduce parameters.