All translations
Enter a message name below to show all available translations.
Found 3 translations.
| Name | Current message text |
|---|---|
| h English (en) | {{Term|bert}} uses the {{Term|encoder}} portion of the {{Term|transformer}} architecture. The model takes a sequence of tokens as input and produces a contextualized {{Term|embedding}} for each token. Two model sizes were released: {{Term|bert}}-Base (12 layers, 768 hidden units, 12 {{Term|attention}} heads, 110M parameters) and {{Term|bert}}-Large (24 layers, 1024 hidden units, 16 {{Term|attention}} heads, 340M parameters). |
| h Spanish (es) | {{Term|bert|BERT}} utiliza la parte del {{Term|encoder|encoder}} de la arquitectura {{Term|transformer|Transformer}}. El modelo toma una secuencia de tokens como entrada y produce un {{Term|embedding|embedding}} contextualizado para cada token. Se publicaron dos tamaños de modelo: {{Term|bert|BERT}}-Base (12 capas, 768 unidades ocultas, 12 cabezas de {{Term|attention|atención}}, 110M de parámetros) y {{Term|bert|BERT}}-Large (24 capas, 1024 unidades ocultas, 16 cabezas de {{Term|attention|atención}}, 340M de parámetros). |
| h Chinese (zh) | {{Term|bert|BERT}} 使用 {{Term|transformer|Transformer}} 架构的 {{Term|encoder|编码器}} 部分。该模型以 token 序列作为输入,并为每个 token 生成上下文化的 {{Term|embedding|嵌入}}。发布了两种模型规模:{{Term|bert|BERT}}-Base(12 层,768 隐藏单元,12 个 {{Term|attention|注意力}} 头,1.1 亿参数)和 {{Term|bert|BERT}}-Large(24 层,1024 隐藏单元,16 个 {{Term|attention|注意力}} 头,3.4 亿参数)。 |