Translations:BERT Pre-training of Deep Bidirectional Transformers/24/en
BERT demonstrated that large-scale unsupervised pre-training could effectively transfer linguistic knowledge to downstream tasks, reducing the need for task-specific labeled data and engineering. This pre-train-then-fine-tune paradigm remains foundational to modern NLP practice.