Translations:BERT Pre-training of Deep Bidirectional Transformers/20/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
Ablation studies demonstrated that both pre-training tasks were important, and that bidirectionality was the most significant factor — removing it caused large drops across all tasks. Increasing model size consistently improved results, even on small-scale tasks when fine-tuned appropriately. | Ablation studies demonstrated that both {{Term|pre-training}} tasks were important, and that bidirectionality was the most significant factor — removing it caused large drops across all tasks. Increasing model size consistently improved results, even on small-scale tasks when fine-tuned appropriately. | ||
Latest revision as of 21:37, 27 April 2026
Ablation studies demonstrated that both pre-training tasks were important, and that bidirectionality was the most significant factor — removing it caused large drops across all tasks. Increasing model size consistently improved results, even on small-scale tasks when fine-tuned appropriately.