Translations:Dropout A Simple Way to Prevent Overfitting/16/en
The authors showed that dropout can be interpreted as training an ensemble of $ 2^n $ sub-networks that share weights. At test time, the scaled full network provides a geometric mean approximation to the ensemble prediction, which the authors proved is exact for a single layer with softmax output.