Translations:Language Models are Few-Shot Learners/4/en

The dominant paradigm in NLP at the time involved pre-training a model on large corpora and then fine-tuning on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction.