Jump to content

Translations:Attention Mechanisms/33/zh

From Marovi AI

掩碼:在自回歸解碼中,未來位置會被掩碼(在 softmax 之前設置為 $-\infty$ ),以保持因果結構。
注意力dropout:訓練期間隨機丟棄注意力權重起到正則化的作用,並減少對特定對齊模式的過擬合。
鍵-值緩存:在推理過程中,先前計算的鍵和值向量會被緩存,以避免冗餘計算,從而顯著加快自回歸生成的速度。

Retrieved from "https://marovi.ai/index.php?title=Translations:Attention_Mechanisms/33/zh&oldid=25069"