Translations:Attention Mechanisms/17/en: Difference between revisions

    From Marovi AI
    (Importing a new version from external source)
     
    (Importing a new version from external source)
    Line 1: Line 1:
    The scaling factor <math>\sqrt{d_k}</math> prevents the dot products from growing large in magnitude as the key dimension <math>d_k</math> increases, which would push the softmax into regions of extremely small gradients.
    The scaling factor <math>\sqrt{d_k}</math> prevents the dot products from growing large in magnitude as the key dimension <math>d_k</math> increases, which would push the {{Term|softmax}} into regions of extremely small gradients.

    Revision as of 19:41, 27 April 2026

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Attention Mechanisms)
    The scaling factor <math>\sqrt{d_k}</math> prevents the dot products from growing large in magnitude as the key dimension <math>d_k</math> increases, which would push the {{Term|softmax}} into regions of extremely small gradients.

    The scaling factor $ \sqrt{d_k} $ prevents the dot products from growing large in magnitude as the key dimension $ d_k $ increases, which would push the softmax into regions of extremely small gradients.