DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 10 → zh

2026-04-27T02:53:28Z

Batch translate Dropout A Simple Way to Prevent Overfitting unit 10 → zh

← Older revision		Revision as of 02:53, 27 April 2026
Line 1:		Line 1:
	<languages />		<languages />
	~~{{LanguageBar \| page = Dropout A Simple Way to Prevent Overfitting}}~~

	{{PaperInfobox		{{PaperInfobox

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 20 → zh

2026-04-27T02:51:03Z

Batch translate Dropout A Simple Way to Prevent Overfitting unit 20 → zh

← Older revision		Revision as of 02:51, 27 April 2026
Line 70:		Line 70:
	== 另见 ==		== 另见 ==

	~~<div lang="en" dir="ltr" class="mw-content-ltr">~~
	* [[ImageNet Classification with Deep CNNs]]		* [[ImageNet Classification with Deep CNNs]]
	* [[Batch Normalization Accelerating Deep Network Training]]		* [[Batch Normalization Accelerating Deep Network Training]]
	* [[Deep Residual Learning for Image Recognition]]		* [[Deep Residual Learning for Image Recognition]]
	~~</div>~~

	~~<div lang~~=~~"en" dir~~=~~"ltr" class~~=~~"mw-content-ltr">~~		== 参考文献 ==
	=~~= References ==~~
	~~</div>~~

	~~<div lang="en" dir="ltr" class="mw-content-ltr">~~
	* Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. ''Journal of Machine Learning Research 15'', 1929-1958. [https://arxiv.org/abs/1207.0580 arXiv:1207.0580]		* Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. ''Journal of Machine Learning Research 15'', 1929-1958. [https://arxiv.org/abs/1207.0580 arXiv:1207.0580]
	* Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. ''arXiv:1207.0580''.		* Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. ''arXiv:1207.0580''.
	* Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013). Regularization of Neural Networks using DropConnect. ''ICML 2013''.		* Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013). Regularization of Neural Networks using DropConnect. ''ICML 2013''.
	~~</div>~~

	~~<div lang="en" dir="ltr" class="mw-content-ltr">~~
	[[Category:Deep Learning]] [[Category:Research]] [[Category:Research Papers]]		[[Category:Deep Learning]] [[Category:Research]] [[Category:Research Papers]]
	~~</div>~~

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 21 → zh

2026-04-27T02:50:53Z

Batch translate Dropout A Simple Way to Prevent Overfitting unit 21 → zh

New page

<languages />
{{LanguageBar | page = Dropout A Simple Way to Prevent Overfitting}}

{{PaperInfobox
| topic_area = Deep Learning
| difficulty = Research
| authors = Nitish Srivastava; Geoffrey Hinton; Alex Krizhevsky; Ilya Sutskever; Ruslan Salakhutdinov
| year = 2014
| venue = JMLR
| arxiv_id = 1207.0580
| source_url = https://arxiv.org/abs/1207.0580
| pdf_url = https://arxiv.org/pdf/1207.0580
}}
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}

'''Dropout: A Simple Way to Prevent Neural Networks from Overfitting''' 是 Srivastava 等人于 2014 年发表在《机器学习研究杂志》（Journal of Machine Learning Research）上的论文。该论文形式化并广泛评估了 '''dropout'''，这是一种在训练期间随机选择并临时移除神经元的正则化技术。Dropout 防止神经元之间形成复杂的共适应，相当于在单一架构内训练一个指数级大的子网络集成，并成为深度学习中应用最广泛的正则化方法之一。

== 概述 ==

具有大量参数的深度神经网络是强大的函数近似器，但容易出现过拟合，尤其是在训练数据有限时。传统的正则化方法（如 L2 权重衰减和早停）能在一定程度上缓解过拟合，但对于大型网络往往不够。模型组合——即训练多个模型并对它们的预测取平均——被认为可以减少过拟合，但计算代价高昂。

Dropout 提供了一种高效的模型组合近似方法。在每个训练步骤中，每个神经元（包括输入单元）以概率 <math>p</math> 被保留，以概率 <math>1 - p</math> 被丢弃（置零）。这意味着在每个训练样本上都会采样出一个不同的“变薄”子网络。在测试时使用所有神经元，但其输出会被 <math>p</math> 缩放，以近似集成的期望输出。

== 主要贡献 ==

* '''Dropout 正则化'''：在每次前向和反向传播过程中随机省略神经元的训练流程，防止神经元形成过度专门化的共适应。
* '''集成解释'''：从理论上将 dropout 视为对 <math>2^n</math> 个可能的变薄网络（其中 <math>n</math> 为可丢弃单元的数量）进行近似模型平均，且这些网络共享权重。
* '''全面的实证评估'''：在视觉、语音识别、文本分类和计算生物学等多个领域中均一致地观察到性能提升。
* '''实用指南'''：关于 dropout 比率（隐藏层 <math>p = 0.5</math>，输入层 <math>p = 0.8</math>）以及与其他超参数交互方式的建议。

== 方法 ==

在训练期间，对于每个训练样本和每一层，每个神经元的输出都会以概率 <math>1 - p</math> 独立地被置零。如果 <math>h_i</math> 是神经元 <math>i</math> 的输出，则 dropout 操作如下：

<math>r_i \sim \text{Bernoulli}(p)</math>

<math>\tilde{h}_i = r_i \cdot h_i</math>

其中 <math>r_i</math> 是随机掩码变量。然后将丢弃后的网络用于该训练样本的前向传播和反向传播。每个训练样本和每个梯度步都会采样不同的随机掩码。

在测试时不丢弃任何单元。相反，每个神经元的输出会乘以 <math>p</math>，以匹配训练期间的期望值：

<math>h_i^{\text{test}} = p \cdot h_i</math>

这种 '''权重缩放推断规则''' 确保每个神经元在测试时的期望输出与训练期间的期望输出相等。一种等价的替代方法 '''反向 dropout'''（inverted dropout）在训练期间将激活值缩放 <math>1/p</math>，从而在测试时无需进行任何修改。这种做法在现代实现中更为常见。

作者证明，dropout 可以被解释为训练 <math>2^n</math> 个共享权重的子网络的集成。在测试时，按比例缩放的完整网络提供了对集成预测的几何均值近似；作者证明，对于具有 softmax 输出的单层网络，这一近似是精确的。

该论文还探讨了 dropout 与其他正则化方法的组合，发现将 dropout 与最大范数约束（将权重向量裁剪为具有最大 L2 范数）以及较大且带衰减的学习率结合使用，能产生最佳效果。

== 结果 ==

Dropout 在多个基准上进行了评估，并一致地降低了测试误差：

* '''MNIST'''（手写数字）：在标准前馈网络上使用 dropout 后，错误率从 1.60% 降至 1.25%。
* '''CIFAR-10/CIFAR-100'''：在卷积网络上显著降低错误率；在 CIFAR-100 上的相对改进约为 15-25%。
* '''SVHN'''（街景门牌号）：错误率从 2.80% 降至 2.68%。
* '''ImageNet'''：dropout 使一个大型卷积网络的 top-1 错误率提升约 2 个百分点。
* '''TIMIT'''（语音识别）：在不同规模的架构中均观察到一致的提升。
* '''Reuters'''（文本分类）：在词袋文本分类任务上性能改善。

论文还分析了使用 dropout 训练的网络所学到的特征，发现与没有 dropout 的网络相比，隐藏单元发展出更具区分性、各自更具意义的特征；而后者往往学习到冗余的、共适应的特征。

== 影响 ==

在 2010 年代，dropout 成为神经网络训练的标准做法，并在大多数深度学习框架中默认启用。其概念上的简洁性以及一贯的有效性，使其成为机器学习领域被引用次数最多的论文之一。在训练期间通过随机扰动进行随机正则化的思想，影响了许多后续技术，包括 DropConnect、DropBlock、随机深度（stochastic depth）和数据增强策略。

虽然批归一化（batch normalization）和其他技术在一些卷积架构中降低了对 dropout 的需求，但 dropout 在全连接层、Transformer 模型以及任何存在过拟合风险的场景中仍然被广泛使用。该论文确立了随机化正则化作为深度学习方法论中的核心原则。

== 另见 ==

<div lang="en" dir="ltr" class="mw-content-ltr">
* [[ImageNet Classification with Deep CNNs]]
* [[Batch Normalization Accelerating Deep Network Training]]
* [[Deep Residual Learning for Image Recognition]]
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
== References ==
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
* Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: A Simple Way to Prevent Neural Networks from Overfitting. ''Journal of Machine Learning Research 15'', 1929-1958. [https://arxiv.org/abs/1207.0580 arXiv:1207.0580]
* Hinton, G. E., Srivastava, N., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2012). Improving Neural Networks by Preventing Co-adaptation of Feature Detectors. ''arXiv:1207.0580''.
* Wan, L., Zeiler, M., Zhang, S., Le Cun, Y., & Fergus, R. (2013). Regularization of Neural Networks using DropConnect. ''ICML 2013''.
</div>

<div lang="en" dir="ltr" class="mw-content-ltr">
[[Category:Deep Learning]] [[Category:Research]] [[Category:Research Papers]]
</div>

Dropout A Simple Way to Prevent Overfitting/zh - Revision history

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 10 → zh

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 20 → zh

DeployBot: Batch translate Dropout A Simple Way to Prevent Overfitting unit 21 → zh