Translations:BERT Pre-training of Deep Bidirectional Transformers/7/en

Masked language modeling (MLM): A pre-training objective that randomly masks input tokens and trains the model to predict them from bidirectional context, enabling true bidirectional representation learning.
Next sentence prediction (NSP): A binary classification pre-training task that teaches the model to understand relationships between sentence pairs.
A simple and effective fine-tuning paradigm: Adding a single output layer to the pre-trained model suffices for a wide range of NLP tasks, from classification to question answering.
Demonstration that deep bidirectional pre-training is critically important for learning general-purpose language representations.