Translations:BERT Pre-training of Deep Bidirectional Transformers/7/en

    From Marovi AI
    Revision as of 21:37, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
    • Masked language modeling (MLM): A pre-training objective that randomly masks input tokens and trains the model to predict them from bidirectional context, enabling true bidirectional representation learning.
    • Next sentence prediction (NSP): A binary classification pre-training task that teaches the model to understand relationships between sentence pairs.
    • A simple and effective fine-tuning paradigm: Adding a single output layer to the pre-trained model suffices for a wide range of NLP tasks, from classification to question answering.
    • Demonstration that deep bidirectional pre-training is critically important for learning general-purpose language representations.