Translations:Dropout A Simple Way to Prevent Overfitting/7/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
* ''' | * '''{{Term|dropout}} {{Term|regularization}}''': A training procedure that randomly omits neurons during each forward and backward pass, preventing neurons from developing overly specialized co-adaptations. | ||
* '''Ensemble interpretation''': Theoretical motivation of dropout as approximate model averaging over <math>2^n</math> possible thinned networks (where <math>n</math> is the number of droppable units), with shared weights. | * '''Ensemble interpretation''': Theoretical motivation of {{Term|dropout}} as approximate model averaging over <math>2^n</math> possible thinned networks (where <math>n</math> is the number of droppable units), with shared weights. | ||
* '''Comprehensive empirical evaluation''': Demonstration of consistent improvements across diverse domains including vision, speech recognition, text classification, and computational biology. | * '''Comprehensive empirical evaluation''': Demonstration of consistent improvements across diverse domains including vision, speech recognition, text classification, and computational biology. | ||
* '''Practical guidelines''': Recommendations for dropout rates (<math>p = 0.5</math> for hidden units, <math>p = 0.8</math> for input units) and interactions with other hyperparameters. | * '''Practical guidelines''': Recommendations for {{Term|dropout}} rates (<math>p = 0.5</math> for hidden units, <math>p = 0.8</math> for input units) and interactions with other {{Term|hyperparameter|hyperparameters}}. | ||
Latest revision as of 21:37, 27 April 2026
- dropout regularization: A training procedure that randomly omits neurons during each forward and backward pass, preventing neurons from developing overly specialized co-adaptations.
- Ensemble interpretation: Theoretical motivation of dropout as approximate model averaging over $ 2^n $ possible thinned networks (where $ n $ is the number of droppable units), with shared weights.
- Comprehensive empirical evaluation: Demonstration of consistent improvements across diverse domains including vision, speech recognition, text classification, and computational biology.
- Practical guidelines: Recommendations for dropout rates ($ p = 0.5 $ for hidden units, $ p = 0.8 $ for input units) and interactions with other hyperparameters.