Translations:Attention Mechanisms/17/zh

    From Marovi AI
    Revision as of 03:21, 27 April 2026 by DeployBot (talk | contribs) (Batch translate Attention Mechanisms unit 17 → zh)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

    縮放因子 $ \sqrt{d_k} $ 防止點積隨着鍵維度 $ d_k $ 的增加而變得過大,否則會使 softmax 進入梯度極小的區域。