Translations:Language Models are Few-Shot Learners/4/en: Difference between revisions

Latest revision as of 21:39, 27 April 2026

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Language Models are Few-Shot Learners)

The dominant paradigm in NLP at the time involved {{Term|pre-training}} a model on large corpora and then {{Term|fine-tuning}} on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction.

The dominant paradigm in NLP at the time involved pre-training a model on large corpora and then fine-tuning on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction.

Revision as of 00:31, 27 April 2026 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)		Latest revision as of 21:39, 27 April 2026 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)
Line 1:		Line 1:
	The dominant paradigm in NLP at the time involved pre-training a model on large corpora and then fine-tuning on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction.		The dominant paradigm in NLP at the time involved {{Term\|pre-training}} a model on large corpora and then {{Term\|fine-tuning}} on task-specific labeled datasets. While effective, this approach required curated datasets for every new task, introduced the possibility of spurious correlations with narrow training distributions, and did not match how humans learn tasks from minimal instruction.