Translations:Overfitting and Regularization/24/en
Dropout can be interpreted as an approximate ensemble method: each training step uses a different subnetwork, and the final model approximates the average prediction of exponentially many subnetworks.