Translations:BERT Pre-training of Deep Bidirectional Transformers/5/en

BERT addressed this limitation by introducing a novel pre-training objective — masked language modeling (MLM) — that enables genuine bidirectional pre-training. Combined with a next sentence prediction (NSP) task, BERT learned rich contextual representations that could be transferred to downstream tasks through simple fine-tuning, eliminating the need for task-specific architectures.