Translations:Attention Mechanisms/3/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) Tag: Manual revert |
| (One intermediate revision by the same user not shown) | |
(No difference)
| |
Revision as of 21:57, 27 April 2026
Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a recurrent neural network. This bottleneck forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.