All translations

Enter a message name below to show all available translations.

Message

Found 3 translations.

NameCurrent message text
 h English (en){{Term|bert}} uses the {{Term|encoder}} portion of the {{Term|transformer}} architecture. The model takes a sequence of tokens as input and produces a contextualized {{Term|embedding}} for each token. Two model sizes were released: {{Term|bert}}-Base (12 layers, 768 hidden units, 12 {{Term|attention}} heads, 110M parameters) and {{Term|bert}}-Large (24 layers, 1024 hidden units, 16 {{Term|attention}} heads, 340M parameters).
 h Spanish (es){{Term|bert|BERT}} utiliza la parte del {{Term|encoder|encoder}} de la arquitectura {{Term|transformer|Transformer}}. El modelo toma una secuencia de tokens como entrada y produce un {{Term|embedding|embedding}} contextualizado para cada token. Se publicaron dos tamaños de modelo: {{Term|bert|BERT}}-Base (12 capas, 768 unidades ocultas, 12 cabezas de {{Term|attention|atención}}, 110M de parámetros) y {{Term|bert|BERT}}-Large (24 capas, 1024 unidades ocultas, 16 cabezas de {{Term|attention|atención}}, 340M de parámetros).
 h Chinese (zh){{Term|bert|BERT}} 使用 {{Term|transformer|Transformer}} 架构的 {{Term|encoder|编码器}} 部分。该模型以 token 序列作为输入,并为每个 token 生成上下文化的 {{Term|embedding|嵌入}}。发布了两种模型规模:{{Term|bert|BERT}}-Base(12 层,768 隐藏单元,12 个 {{Term|attention|注意力}} 头,1.1 亿参数)和 {{Term|bert|BERT}}-Large(24 层,1024 隐藏单元,16 个 {{Term|attention|注意力}} 头,3.4 亿参数)。