Translations:Attention Mechanisms/17/zh: Difference between revisions

    From Marovi AI
    (Batch translate Attention Mechanisms unit 17 → zh)
    Tag: translation
     
    (Batch translate Attention Mechanisms unit 17 → zh)
    Tag: translation
    Line 1: Line 1:
    缩放因子 <math>\sqrt{d_k}</math> 防止点积随着键维度 <math>d_k</math> 的增加而变得过大,否则会使 softmax 进入梯度极小的区域。
    缩放因子 <math>\sqrt{d_k}</math> 防止点积在键维度 <math>d_k</math> 增大时数值过大,否则会使 softmax 进入梯度极小的区域。

    Revision as of 21:58, 27 April 2026

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Attention Mechanisms)
    The scaling factor <math>\sqrt{d_k}</math> prevents the dot products from growing large in magnitude as the key dimension <math>d_k</math> increases, which would push the {{Term|softmax}} into regions of extremely small gradients.

    缩放因子 $ \sqrt{d_k} $ 防止点积在键维度 $ d_k $ 增大时数值过大,否则会使 softmax 进入梯度极小的区域。