All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	For context, the authors note that an entire year of architectural tuning between {{Term\|inception}}-v3 and {{Term\|inception}}-{{Term\|residual network\|ResNet}}-v2 yielded a 1.3% improvement, so the gains from a one-line {{Term\|activation function\|activation}} swap are economically meaningful. On a 12-layer "Base [[Transformer (machine learning model)\|Transformer]]" trained on WMT 2014 English→German, {{Term\|swish}}-1 also matches or exceeds every baseline across four newstest sets, with the largest gain on newstest2016 (+0.6 {{Term\|bleu}} over the next-best).
^h Spanish (es)	Para dar contexto, los autores señalan que un año entero de ajuste arquitectónico entre {{Term\|inception}}-v3 e {{Term\|inception}}-{{Term\|residual network\|ResNet}}-v2 produjo una mejora del 1.3%, por lo que las ganancias provenientes de un cambio de una línea en la {{Term\|activation function\|función de activación}} son económicamente significativas. En un [[Transformer (machine learning model)\|Transformer]] base de 12 capas entrenado en WMT 2014 inglés→alemán, {{Term\|swish}}-1 también iguala o supera a todas las líneas base en los cuatro conjuntos newstest, con la mayor ganancia en newstest2016 (+0.6 {{Term\|bleu}} sobre la siguiente mejor).
^h Chinese (zh)	作为参考,作者指出从 {{Term\|inception}}-v3 到 {{Term\|inception}}-{{Term\|residual network\|ResNet}}-v2 整整一年的架构调优带来了 1.3% 的提升,因此仅一行的{{Term\|activation function\|激活函数}}替换所带来的收益在经济上是有意义的。在 WMT 2014 英语→德语上训练的 12 层“Base [[Transformer (machine learning model)\|Transformer]]”上,{{Term\|swish}}-1 在四个 newstest 集上同样匹配或超越所有基线,最大增益出现在 newstest2016 上(比次优值高 +0.6 {{Term\|bleu}})。