Translations:Attention Is All You Need/23/en

    From Marovi AI

    Ablation studies showed that multi-head attention outperformed single-head attention, that the scaling factor was important for large key dimensions, and that learned positional embeddings performed comparably to the sinusoidal encodings.