All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	{{Term\|pre-training}} used the BooksCorpus (800M words) and English Wikipedia (2,500M words), running for 1M {{Term\|training step\|steps}} with a {{Term\|batch size}} of 256 sequences. The total {{Term\|pre-training}} compute was substantial for its time, requiring four days on 4 to 16 Cloud TPUs (for Base and Large respectively).
^h Spanish (es)	El {{Term\|pre-training\|preentrenamiento}} utilizó BooksCorpus (800M de palabras) y la Wikipedia en inglés (2500M de palabras), ejecutándose durante 1M de {{Term\|training step\|pasos}} con un {{Term\|batch size\|tamaño de lote}} de 256 secuencias. El cómputo total de {{Term\|pre-training\|preentrenamiento}} fue considerable para su época, requiriendo cuatro días en 4 a 16 Cloud TPUs (para Base y Large respectivamente).
^h Chinese (zh)	{{Term\|pre-training\|预训练}} 使用了 BooksCorpus（8 亿单词）和英文维基百科（25 亿单词），以 256 个序列的 {{Term\|batch size\|批量大小}} 运行了 100 万 {{Term\|training step\|训练步}}。在当时，{{Term\|pre-training\|预训练}} 的总计算量相当庞大，需要在 4 到 16 个 Cloud TPU 上运行四天（分别对应 Base 和 Large）。