Translations:Language Models are Few-Shot Learners/4/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
The dominant paradigm in NLP at the time involved pre-training a model on large corpora and then fine-tuning on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction. | The dominant paradigm in NLP at the time involved {{Term|pre-training}} a model on large corpora and then {{Term|fine-tuning}} on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction. | ||
Latest revision as of 21:39, 27 April 2026
The dominant paradigm in NLP at the time involved pre-training a model on large corpora and then fine-tuning on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction.