DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 28 → es

2026-04-27T02:53:29Z

Batch translate Dropout A Simple Way to Prevent Overfitting unit 28 → es

← Older revision		Revision as of 02:53, 27 April 2026
Line 1:		Line 1:
	<languages />		<languages />
	~~{{LanguageBar \| page = Dropout A Simple Way to Prevent Overfitting}}~~

	{{PaperInfobox		{{PaperInfobox

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 10 → es

2026-04-27T02:51:03Z

Batch translate Dropout A Simple Way to Prevent Overfitting unit 10 → es

Show changes

DeployBot: test

2026-04-27T02:50:26Z

test

← Older revision		Revision as of 02:50, 27 April 2026
Line 2:		Line 2:
	{{LanguageBar \| page = Dropout A Simple Way to Prevent Overfitting}}		{{LanguageBar \| page = Dropout A Simple Way to Prevent Overfitting}}

	~~<div lang="en" dir="ltr" class="mw-content-ltr">~~
	{{PaperInfobox		{{PaperInfobox
	\| topic_area = Deep Learning		\| topic_area = Deep Learning
	\| difficulty = Research		\| difficulty = Research
	\| authors = Nitish Srivastava~~; Geoffrey Hinton; Alex Krizhevsky; Ilya Sutskever; Ruslan Salakhutdinov~~		\| authors = Nitish Srivastava
	\| year = 2014		\| year = 2014
	~~\| venue = JMLR~~
	~~\| arxiv_id = 1207.0580~~
	~~\| source_url = https://arxiv.org/abs/1207.0580~~
	~~\| pdf_url = https://arxiv.org/pdf/1207.0580~~
	}}		}}
	{{ContentMeta \| generated_by = ~~claude-opus \| model_used = claude-opus-4-6 \| generated_date = 2026-03-13~~}}		{{ContentMeta \| generated_by = test}}
	~~</div>~~

	<div lang="en" dir="ltr" class="mw-content-ltr">		<div lang="en" dir="ltr" class="mw-content-ltr">

DeployBot: test

2026-04-27T02:49:52Z

test

New page

<languages />
{{LanguageBar | page = Dropout A Simple Way to Prevent Overfitting}}

<div lang="en" dir="ltr" class="mw-content-ltr">
{{PaperInfobox
| topic_area = Deep Learning
| difficulty = Research
| authors = Nitish Srivastava; Geoffrey Hinton; Alex Krizhevsky; Ilya Sutskever; Ruslan Salakhutdinov
| year = 2014
| venue = JMLR
| arxiv_id = 1207.0580
| source_url = https://arxiv.org/abs/1207.0580
| pdf_url = https://arxiv.org/pdf/1207.0580
}}
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
'''Dropout: A Simple Way to Prevent Neural Networks from Overfitting''' is a 2014 paper by Srivastava et al. published in the Journal of Machine Learning Research. The paper formalized and extensively evaluated '''dropout''', a regularization technique in which randomly selected neurons are temporarily removed during training. Dropout prevents complex co-adaptations between neurons, effectively training an exponentially large ensemble of sub-networks within a single architecture, and became one of the most widely used regularization methods in deep learning.
</div>

Test

<div lang="en" dir="ltr" class="mw-content-ltr">
Deep neural networks with many parameters are powerful function approximators but are prone to overfitting, especially when training data is limited. Traditional regularization methods such as L2 weight decay and early stopping provided some relief, but were often insufficient for large networks. Model combination — training multiple models and averaging their predictions — was known to reduce overfitting but was computationally expensive.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
Dropout provides an efficient approximation to model combination. During each training step, each neuron (including input units) is retained with a probability <math>p</math> and dropped (set to zero) with probability <math>1 - p</math>. This means that on each training case, a different "thinned" sub-network is sampled. At test time, all neurons are used but their outputs are scaled by <math>p</math> to approximate the expected output of the ensemble.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== Key Contributions ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
* '''Dropout regularization''': A training procedure that randomly omits neurons during each forward and backward pass, preventing neurons from developing overly specialized co-adaptations.
* '''Ensemble interpretation''': Theoretical motivation of dropout as approximate model averaging over <math>2^n</math> possible thinned networks (where <math>n</math> is the number of droppable units), with shared weights.
* '''Comprehensive empirical evaluation''': Demonstration of consistent improvements across diverse domains including vision, speech recognition, text classification, and computational biology.
* '''Practical guidelines''': Recommendations for dropout rates (<math>p = 0.5</math> for hidden units, <math>p = 0.8</math> for input units) and interactions with other hyperparameters.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== Methods ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
During training, for each training example and each layer, each neuron's output is independently set to zero with probability <math>1 - p</math>. If <math>h_i</math> is the output of neuron <math>i</math>, the dropout operation applies:
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
<math>r_i \sim \text{Bernoulli}(p)</math>
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
<math>\tilde{h}_i = r_i \cdot h_i</math>
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
where <math>r_i</math> is a random mask variable. The dropped-out network is then used for the forward pass and backpropagation on that training case. Different random masks are drawn for each training example and each gradient step.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
At test time, no units are dropped. Instead, the output of each neuron is multiplied by <math>p</math> to match the expected value during training:
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
<math>h_i^{\text{test}} = p \cdot h_i</math>
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
This '''weight scaling inference rule''' ensures that the expected output of each neuron at test time equals its expected output during training. An equivalent alternative, '''inverted dropout''', scales activations by <math>1/p</math> during training so that no modification is needed at test time. This approach is more common in modern implementations.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
The authors showed that dropout can be interpreted as training an ensemble of <math>2^n</math> sub-networks that share weights. At test time, the scaled full network provides a geometric mean approximation to the ensemble prediction, which the authors proved is exact for a single layer with softmax output.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
The paper also explored dropout with other regularizers, finding that combining dropout with max-norm constraints (clipping the weight vector to have a maximum L2 norm) and large decayed learning rates produced the best results.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== Results ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
Dropout was evaluated across multiple benchmarks and consistently reduced test error:
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
* '''MNIST''' (handwritten digits): Error reduced from 1.60% to 1.25% with dropout on a standard feedforward network.
* '''CIFAR-10/CIFAR-100''': Significant error reductions on convolutional networks; relative improvement of approximately 15-25% on CIFAR-100.
* '''SVHN''' (Street View House Numbers): Error reduced from 2.80% to 2.68%.
* '''ImageNet''': Dropout improved the top-1 error of a large convolutional network by approximately 2 percentage points.
* '''TIMIT''' (speech recognition): Consistent improvements across various architecture sizes.
* '''Reuters''' (text classification): Improved performance on a bag-of-words text classification task.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
The paper also analyzed the features learned by networks trained with dropout, finding that hidden units developed more distinct and individually meaningful features compared to networks without dropout, which tended to learn redundant co-adapted features.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== Impact ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
Dropout became standard practice in neural network training throughout the 2010s, included by default in most deep learning frameworks. Its conceptual simplicity and consistent effectiveness made it one of the most cited papers in machine learning. The idea of stochastic regularization through random perturbation during training influenced many subsequent techniques, including DropConnect, DropBlock, stochastic depth, and data augmentation strategies.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
While batch normalization and other techniques have reduced the necessity of dropout in some convolutional architectures, dropout remains widely used in fully connected layers, Transformer models, and whenever overfitting is a concern. The paper established randomized regularization as a core principle in deep learning methodology.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== See also ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
* [[ImageNet Classification with Deep CNNs]]
* [[Batch Normalization Accelerating Deep Network Training]]
* [[Deep Residual Learning for Image Recognition]]
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== References ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
* Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. ''Journal of Machine Learning Research 15'', 1929-1958. [https://arxiv.org/abs/1207.0580 arXiv:1207.0580]
* Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. ''arXiv:1207.0580''.
* Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013). Regularization of Neural Networks using DropConnect. ''ICML 2013''.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
[[Category:Deep Learning]] [[Category:Research]] [[Category:Research Papers]]
</div>

Dropout A Simple Way to Prevent Overfitting/es - Revision history

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 28 → es

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 10 → es

DeployBot: test

DeployBot: test