Translations:Attention Mechanisms/17/zh: Difference between revisions
(Batch translate Attention Mechanisms unit 17 → zh) Tag: translation |
(Batch translate Attention Mechanisms unit 17 → zh) Tag: translation |
||
| (One intermediate revision by the same user not shown) | |||
| Line 1: | Line 1: | ||
缩放因子 <math>\sqrt{d_k}</math> | 缩放因子 <math>\sqrt{d_k}</math> 可防止点积随着键维度 <math>d_k</math> 的增大而变大,否则会将 {{Term|softmax|softmax}} 推入梯度极小的区域。 | ||
Latest revision as of 23:36, 27 April 2026
缩放因子 $ \sqrt{d_k} $ 可防止点积随着键维度 $ d_k $ 的增大而变大,否则会将 softmax 推入梯度极小的区域。