Jump to content

Translations:Attention Mechanisms/17/en

From Marovi AI

Revision as of 23:33, 27 April 2026 by FuzzyBot (talk | contribs) (Importing a new version from external source)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The scaling factor $\sqrt{d_k}$ prevents the dot products from growing large in magnitude as the key dimension $$ d_k $$ increases, which would push the softmax into regions of extremely small gradients.

Retrieved from "https://marovi.ai/index.php?title=Translations:Attention_Mechanisms/17/en&oldid=17511"