DeployBot: Batch translate Recurrent Neural Networks unit 16 → zh

2026-04-27T04:01:10Z

Batch translate Recurrent Neural Networks unit 16 → zh

← Older revision		Revision as of 04:01, 27 April 2026
Line 102:		Line 102:
	== 参考文献 ==		== 参考文献 ==

	~~<div lang="en" dir="ltr" class="mw-content-ltr">~~
	* Elman, J. L. (1990). "Finding Structure in Time". ''Cognitive Science'', 14(2), 179–211.		* Elman, J. L. (1990). "Finding Structure in Time". ''Cognitive Science'', 14(2), 179–211.
	* Hochreiter, S. ~~and~~ Schmidhuber, J. (1997). "Long Short-Term Memory". ''Neural Computation'', 9(8), 1735–1780.		* Hochreiter, S. 与 Schmidhuber, J. (1997). "Long Short-Term Memory". ''Neural Computation'', 9(8), 1735–1780.
	* Cho, K. et al. (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". ''EMNLP''.		* Cho, K. et al. (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". ''EMNLP''.
	* Sutskever, I., Vinyals, O. ~~and~~ Le, Q. V. (2014). "Sequence to Sequence Learning with Neural Networks". ''NeurIPS''.		* Sutskever, I., Vinyals, O. 与 Le, Q. V. (2014). "Sequence to Sequence Learning with Neural Networks". ''NeurIPS''.
	* Goodfellow, I., Bengio, Y. ~~and~~ Courville, A. (2016). ''Deep Learning'', ~~Chapter~~ 10. MIT Press.		* Goodfellow, I., Bengio, Y. 与 Courville, A. (2016). ''Deep Learning'', 第 10 章. MIT Press.
	~~</div>~~

	[[Category:Deep Learning]]		[[Category:Deep Learning]]
	[[Category:Intermediate]]		[[Category:Intermediate]]
	[[Category:Neural Networks]]		[[Category:Neural Networks]]

DeployBot: Batch translate Recurrent Neural Networks unit 32 → zh

2026-04-27T03:42:07Z

Batch translate Recurrent Neural Networks unit 32 → zh

New page

<languages />
{{ArticleInfobox | topic_area = Deep Learning | difficulty = Intermediate | prerequisites = [[Neural Networks]], [[Backpropagation]]}}
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}

'''循环神经网络'''（'''RNN'''）是一类[[Neural Networks|神经网络]]，旨在处理'''序列数据'''——元素顺序具有重要意义的数据。与前馈网络不同，RNN 包含循环连接，使信息能够跨时间步持续存在，从而赋予它们一种记忆形式。

== 序列建模 ==

现实世界中的许多问题涉及序列：文本是单词的序列，语音是音频帧的序列，股票价格构成时间序列，DNA 是核苷酸的序列。标准前馈网络需要固定大小的输入，并独立处理每个输入，这使得它们不适用于上下文重要且长度可变的序列。

RNN 通过一次处理一个元素的输入来解决这个问题，同时维护一个'''隐藏状态'''，用于总结迄今为止所看到的信息。

== 基础 RNN ==

在每个时间步 <math>t</math>，基础 RNN 计算：

:<math>\mathbf{h}_t = \tanh(\mathbf{W}_{hh}\,\mathbf{h}_{t-1} + \mathbf{W}_{xh}\,\mathbf{x}_t + \mathbf{b}_h)</math>

:<math>\mathbf{y}_t = \mathbf{W}_{hy}\,\mathbf{h}_t + \mathbf{b}_y</math>

其中 <math>\mathbf{x}_t</math> 是时间 <math>t</math> 处的输入，<math>\mathbf{h}_t</math> 是隐藏状态，<math>\mathbf{y}_t</math> 是输出，<math>\mathbf{W}_{hh}, \mathbf{W}_{xh}, \mathbf{W}_{hy}</math> 是在所有时间步之间共享的权重矩阵。初始隐藏状态 <math>\mathbf{h}_0</math> 通常设置为零向量。

关键的思想是在每个时间步都应用相同的参数——'''时间上的权重共享'''——这使得网络能够在序列中的不同位置之间进行泛化。

== 时间反向传播（BPTT） ==

训练 RNN 需要计算损失相对于共享权重的梯度。'''时间反向传播'''（BPTT）将 RNN 在时间步上"展开"，生成一个具有共享权重的深度前馈网络，然后应用标准的[[Backpropagation|反向传播]]。

对于长度为 <math>T</math> 的序列，损失相对于 <math>\mathbf{W}_{hh}</math> 的梯度涉及雅可比矩阵的乘积：

:<math>\frac{\partial L}{\partial \mathbf{W}_{hh}} = \sum_{t=1}^{T}\frac{\partial L_t}{\partial \mathbf{W}_{hh}} = \sum_{t=1}^{T}\sum_{k=1}^{t}\frac{\partial L_t}{\partial \mathbf{h}_t}\left(\prod_{j=k+1}^{t}\frac{\partial \mathbf{h}_j}{\partial \mathbf{h}_{j-1}}\right)\frac{\partial \mathbf{h}_k}{\partial \mathbf{W}_{hh}}</math>

雅可比矩阵的乘积 <math>\prod \partial \mathbf{h}_j / \partial \mathbf{h}_{j-1}</math> 是梯度消失和梯度爆炸问题的根源。

== 梯度消失问题 ==

当循环雅可比矩阵的谱半径小于 1 时，梯度信号会随时间呈指数衰减——这就是'''梯度消失问题'''。这使得基础 RNN 极难学习跨越 10–20 个以上时间步的依赖关系。

相反，当谱半径超过 1 时，梯度可能呈指数增长——这就是'''梯度爆炸问题'''。梯度爆炸通常通过'''梯度裁剪'''（将梯度范数限制在某个阈值内）来处理，但梯度消失则需要架构层面的解决方案。

== 长短期记忆（LSTM） ==

'''LSTM'''（Hochreiter 和 Schmidhuber，1997）引入了一个'''细胞状态''' <math>\mathbf{c}_t</math>，它以最小的干扰随时间流动，以及三个控制信息流动的'''门'''：

:<math>\mathbf{f}_t = \sigma(\mathbf{W}_f[\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_f)</math> （'''遗忘门'''）

:<math>\mathbf{i}_t = \sigma(\mathbf{W}_i[\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_i)</math> （'''输入门'''）

:<math>\tilde{\mathbf{c}}_t = \tanh(\mathbf{W}_c[\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_c)</math> （'''候选细胞状态'''）

:<math>\mathbf{c}_t = \mathbf{f}_t \odot \mathbf{c}_{t-1} + \mathbf{i}_t \odot \tilde{\mathbf{c}}_t</math> （'''细胞状态更新'''）

:<math>\mathbf{o}_t = \sigma(\mathbf{W}_o[\mathbf{h}_{t-1}, \mathbf{x}_t] + \mathbf{b}_o)</math> （'''输出门'''）

:<math>\mathbf{h}_t = \mathbf{o}_t \odot \tanh(\mathbf{c}_t)</math>

细胞状态就像一条传送带：遗忘门决定丢弃哪些旧信息，输入门决定存储哪些新信息，输出门控制暴露给下一层的内容。由于细胞状态是通过加法（而不是乘法）进行更新的，梯度可以更容易地在长序列中流动。

== 门控循环单元（GRU） ==

'''GRU'''（Cho 等，2014）通过合并细胞状态和隐藏状态，并仅使用两个门来简化 LSTM：

:<math>\mathbf{z}_t = \sigma(\mathbf{W}_z[\mathbf{h}_{t-1}, \mathbf{x}_t])</math> （'''更新门'''）

:<math>\mathbf{r}_t = \sigma(\mathbf{W}_r[\mathbf{h}_{t-1}, \mathbf{x}_t])</math> （'''重置门'''）

:<math>\tilde{\mathbf{h}}_t = \tanh(\mathbf{W}[\mathbf{r}_t \odot \mathbf{h}_{t-1}, \mathbf{x}_t])</math>

:<math>\mathbf{h}_t = (1 - \mathbf{z}_t) \odot \mathbf{h}_{t-1} + \mathbf{z}_t \odot \tilde{\mathbf{h}}_t</math>

GRU 的参数比 LSTM 少，通常能够达到相当的性能。在实践中，LSTM 和 GRU 之间的选择通常是凭经验做出的。

== 双向 RNN ==

'''双向 RNN''' 在两个方向上处理序列——前向（从左到右）和后向（从右到左）——并将隐藏状态拼接起来：

:<math>\mathbf{h}_t = [\overrightarrow{\mathbf{h}}_t;\; \overleftarrow{\mathbf{h}}_t]</math>

这使得模型在每个时间步都能利用过去和未来的上下文，这对于命名实体识别和机器翻译等任务非常有益，因为在这些任务中，单词的含义取决于其周围的上下文。

== 应用 ==

RNN 及其门控变体已被应用于各种序列任务：

* '''语言建模''' —— 预测序列中的下一个单词。
* '''机器翻译''' —— 用于序列到序列翻译的编码器-解码器架构（Sutskever 等，2014）。
* '''语音识别''' —— 将音频转录为文本（通常与 CTC 损失结合使用）。
* '''情感分析''' —— 对文本的情感进行分类。
* '''时间序列预测''' —— 预测金融或传感器数据的未来值。
* '''音乐生成''' —— 生成音符序列。

需要注意的是，对于许多 NLP 任务，'''Transformers'''（Vaswani 等，2017）由于能够并行处理序列，并通过 self-attention 更有效地捕捉长距离依赖关系，已在很大程度上取代了 RNN。

== 参见 ==

* [[Neural Networks]]
* [[Backpropagation]]
* [[Convolutional Neural Networks]]
* [[Word Embeddings]]
* [[Overfitting and Regularization]]

== 参考文献 ==

<div lang="en" dir="ltr" class="mw-content-ltr">
* Elman, J. L. (1990). "Finding Structure in Time". ''Cognitive Science'', 14(2), 179–211.
* Hochreiter, S. and Schmidhuber, J. (1997). "Long Short-Term Memory". ''Neural Computation'', 9(8), 1735–1780.
* Cho, K. et al. (2014). "Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation". ''EMNLP''.
* Sutskever, I., Vinyals, O. and Le, Q. V. (2014). "Sequence to Sequence Learning with Neural Networks". ''NeurIPS''.
* Goodfellow, I., Bengio, Y. and Courville, A. (2016). ''Deep Learning'', Chapter 10. MIT Press.
</div>

[[Category:Deep Learning]]
[[Category:Intermediate]]
[[Category:Neural Networks]]

Recurrent Neural Networks/zh - Revision history

DeployBot: Batch translate Recurrent Neural Networks unit 16 → zh

DeployBot: Batch translate Recurrent Neural Networks unit 32 → zh