Translations:Attention Mechanisms/3/zh: Difference between revisions

Revision as of 21:58, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Attention Mechanisms)

Early {{Term|sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

早期的序列到序列模型使用循环神经网络将整个输入序列编码为一个固定维度的向量。这种瓶颈迫使长程依赖被压缩到一个固定大小的向量中,从而降低了长序列上的性能。注意力机制通过让解码器在每个生成步骤都能查询编码器的所有隐藏状态、并按学习到的相关性得分对它们加权,从而解决了这一问题。

Revision as of 03:21, 27 April 2026 (view source) DeployBot (talk \| contribs) (Batch translate Attention Mechanisms unit 3 → zh) Tag: translation		Revision as of 21:58, 27 April 2026 (view source) DeployBot (talk \| contribs) (Batch translate Attention Mechanisms unit 3 → zh) Tag: translation Newer edit →
Line 1:		Line 1:
	早期的序列到序列模型使用[[Recurrent Neural Networks\|循环神经网络]]~~将整个输入序列编码为单个固定维度的向量。这种~~''瓶颈''~~迫使长程依赖被压缩到一个大小恒定的向量中~~,~~从而降低了在长序列上的性能。注意力通过让解码器在每个生成步骤都参考每个编码器隐藏状态,并根据学习到的相关性分数对它们进行加权~~,从而解决了这一问题。		早期的序列到序列模型使用 [[Recurrent Neural Networks\|循环神经网络]] 将整个输入序列编码为一个固定维度的向量。这种''瓶颈''迫使长程依赖被压缩到一个固定大小的向量中,从而降低了长序列上的性能。注意力机制通过让解码器在每个生成步骤都能查询编码器的所有隐藏状态、并按学习到的相关性得分对它们加权,从而解决了这一问题。