Translations:Attention Mechanisms/17/zh

    From Marovi AI
    Revision as of 21:58, 27 April 2026 by DeployBot (talk | contribs) (Batch translate Attention Mechanisms unit 17 → zh)

    缩放因子 $ \sqrt{d_k} $ 防止点积在键维度 $ d_k $ 增大时数值过大,否则会使 softmax 进入梯度极小的区域。