Translations:Attention Mechanisms/3/en - Revision history

FuzzyBot: Importing a new version from external source

2026-04-27T23:33:26Z

Importing a new version from external source

← Older revision		Revision as of 23:33, 27 April 2026
Line 1:		Line 1:
	Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.		Early {{Term\|sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

FuzzyBot: Importing a new version from external source

2026-04-27T21:57:35Z

Importing a new version from external source

← Older revision		Revision as of 21:57, 27 April 2026
Line 1:		Line 1:
	Early ~~{{Term\|~~sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.		Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

FuzzyBot: Importing a new version from external source

2026-04-27T19:41:38Z

Importing a new version from external source

← Older revision		Revision as of 19:41, 27 April 2026
Line 1:		Line 1:
	Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.		Early {{Term\|sequence-to-sequence}} models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks\|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.

FuzzyBot: Importing a new version from external source

2026-04-27T00:30:19Z

Importing a new version from external source

New page

Early sequence-to-sequence models encoded an entire input sequence into a single fixed-dimensional vector using a [[Recurrent Neural Networks|recurrent neural network]]. This ''bottleneck'' forced long-range dependencies to be compressed into a vector of constant size, degrading performance on long sequences. Attention resolves this by letting the decoder consult every encoder hidden state at each generation step, weighting them by learned relevance scores.