All translations
Enter a message name below to show all available translations.
Found 3 translations.
| Name | Current message text |
|---|---|
| h English (en) | GPT-3 uses the same architecture as GPT-2 — a decoder-only {{Term|transformer}} with pre-normalization — but scaled to 175 billion parameters across 96 layers, with a hidden size of 12,288 and 96 {{Term|attention}} heads. Alternating dense and locally banded sparse {{Term|attention}} patterns were used in the layers. |
| h Spanish (es) | GPT-3 utiliza la misma arquitectura que GPT-2 — un {{Term|transformer|transformer}} solo decodificador con prenormalización — pero escalado a 175 mil millones de parámetros distribuidos en 96 capas, con un tamaño oculto de 12 288 y 96 cabezas de {{Term|attention|atención}}. En las capas se emplearon patrones alternados de {{Term|attention|atención}} densa y dispersa con bandas locales. |
| h Chinese (zh) | GPT-3 采用与 GPT-2 相同的架构——具有预归一化的仅解码器 {{Term|transformer|transformer}}——但扩展到 96 层共 1750 亿参数,隐藏维度为 12,288,并具有 96 个 {{Term|attention|注意力}}头。各层中交替使用了密集与局部带状稀疏的 {{Term|attention|注意力}}模式。 |