Translations:BERT Pre-training of Deep Bidirectional Transformers/7/en

    From Marovi AI
    • Masked language modeling (MLM): A pre-training objective that randomly masks input tokens and trains the model to predict them from bidirectional context, enabling true bidirectional representation learning.
    • Next sentence prediction (NSP): A binary classification pre-training task that teaches the model to understand relationships between sentence pairs.
    • A simple and effective fine-tuning paradigm: Adding a single output layer to the pre-trained model suffices for a wide range of NLP tasks, from classification to question answering.
    • Demonstration that deep bidirectional pre-training is critically important for learning general-purpose language representations.