Translations:Attention Is All You Need/7/en

    From Marovi AI
    Revision as of 21:39, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)
    • Introduction of the transformer, the first sequence transduction model based entirely on attention without recurrence or convolution.
    • The scaled dot-product attention mechanism and multi-head attention, which allow the model to jointly attend to information from different representation subspaces at different positions.
    • Positional encodings using sinusoidal functions, providing the model with information about token order without recurrence.
    • Demonstration that attention-only models can achieve state-of-the-art results on machine translation while being more parallelizable and faster to train.