Translations:Attention Mechanisms/3/zh: Difference between revisions

Latest revision as of 23:36, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Attention Mechanisms)

Early {{Term|sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

早期的序列到序列模型使用循环神经网络将整个输入序列编码为单一的固定维向量。这种瓶颈迫使长程依赖被压缩到恒定大小的向量中，从而降低了在长序列上的性能。注意力机制通过让解码器在每个生成步骤查询编码器的每个隐藏状态，并以学习到的相关性分数对其加权，从而解决了这一问题。

Revision as of 21:58, 27 April 2026 (view source) DeployBot (talk \| contribs) (Batch translate Attention Mechanisms unit 3 → zh) Tag: translation ← Older edit		Latest revision as of 23:36, 27 April 2026 (view source) DeployBot (talk \| contribs) (Batch translate Attention Mechanisms unit 3 → zh) Tag: translation
Line 1:		Line 1:
	~~早期的序列到序列模型使用~~ [[Recurrent Neural Networks\|循环神经网络]] ~~将整个输入序列编码为一个固定维度的向量。这种~~''瓶颈''迫使长程依赖被压缩到一个固定大小的向量中,从而降低了长序列上的性能。注意力机制通过让解码器在每个生成步骤都能查询编码器的所有隐藏状态、并按学习到的相关性得分对它们加权,从而解决了这一问题。		早期的{{Term\|sequence-to-sequence\|序列到序列}}模型使用[[Recurrent Neural Networks\|循环神经网络]]将整个输入序列编码为单一的固定维向量。这种''瓶颈''迫使长程依赖被压缩到恒定大小的向量中，从而降低了在长序列上的性能。注意力机制通过让解码器在每个生成步骤查询编码器的每个隐藏状态，并以学习到的相关性分数对其加权，从而解决了这一问题。