Jump to content

Translations:Attention Mechanisms/33/zh

From Marovi AI

掩码:在自回归解码中,未来位置会被掩码(在 softmax 之前设置为 $-\infty$ ),以保持因果结构。
注意力dropout:训练期间随机丢弃注意力权重起到正则化的作用,并减少对特定对齐模式的过拟合。
键-值缓存:在推理过程中,先前计算的键和值向量会被缓存,以避免冗余计算,从而显著加快自回归生成的速度。

Retrieved from "https://marovi.ai/index.php?title=Translations:Attention_Mechanisms/33/zh&oldid=25069"