Translations:Attention Mechanisms/31/en

    From Marovi AI
    Revision as of 21:57, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)

    Cross-attention is used when queries come from one sequence and keys/values come from another. In encoder-decoder Transformers, the decoder attends to encoder outputs via cross-attention, enabling the model to condition its generation on the full input context.