Translations:Attention Is All You Need/13/en
Multi-head attention applies several attention functions in parallel, each with different learned linear projections:
Multi-head attention applies several attention functions in parallel, each with different learned linear projections: