Translations:Attention Mechanisms/17/es

    From Marovi AI
    Revision as of 04:29, 28 April 2026 by DeployBot (talk | contribs) (Batch translate Attention Mechanisms unit 17 → es)
    (diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

    El factor de escala $ \sqrt{d_k} $ evita que los productos punto crezcan en magnitud a medida que aumenta la dimensión de las claves $ d_k $, lo cual empujaría al softmax a regiones con gradientes extremadamente pequeños.