Translations:Attention Is All You Need/7/en
- Introduction of the Transformer, the first sequence transduction model based entirely on attention without recurrence or convolution.
- The scaled dot-product attention mechanism and multi-head attention, which allow the model to jointly attend to information from different representation subspaces at different positions.
- Positional encodings using sinusoidal functions, providing the model with information about token order without recurrence.
- Demonstration that attention-only models can achieve state-of-the-art results on machine translation while being more parallelizable and faster to train.