Translations:Attention Mechanisms/3/en: Difference between revisions

Latest revision as of 23:33, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Attention Mechanisms)

Early {{Term|sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a recurrent neural network. This bottleneck forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

Revision as of 21:57, 27 April 2026 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source) Tag: Manual revert ← Older edit		Latest revision as of 23:33, 27 April 2026 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source) Tag: Manual revert
Line 1:		Line 1:
	Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.		Early {{Term\|sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.