All translations
Enter a message name below to show all available translations.
Found 3 translations.
| Name | Current message text |
|---|---|
| h English (en) | Ablation studies showed that multi-head {{Term|attention}} outperformed single-head {{Term|attention}}, that the scaling factor was important for large key dimensions, and that learned positional {{Term|embedding|embeddings}} performed comparably to the sinusoidal encodings. |
| h Spanish (es) | Los estudios de ablación mostraron que la {{Term|attention|atención}} multi-cabeza superaba a la {{Term|attention|atención}} de cabeza única, que el factor de escala era importante para dimensiones de clave grandes, y que los {{Term|embedding|embeddings}} posicionales aprendidos rendían de forma comparable a las codificaciones sinusoidales. |
| h Chinese (zh) | 消融研究表明,多头 {{Term|attention|注意力}} 优于单头 {{Term|attention|注意力}};对于较大的键维度,缩放因子至关重要;可学习的位置 {{Term|embedding|嵌入}} 的表现与正弦编码相当。 |