All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	* '''Masking''': In autoregressive decoding, future positions are masked (set to <math>-\infty</math> before softmax) to preserve the causal structure. * '''Attention dropout''': Dropping attention weights randomly during training acts as a regulariser and reduces overfitting to specific alignment patterns. * '''Key-value caching''': During inference, previously computed key and value vectors are cached to avoid redundant computation, significantly speeding up autoregressive generation.
^h Spanish (es)	* '''Enmascaramiento''': En la decodificación autorregresiva, las posiciones futuras se enmascaran (se establecen en <math>-\infty</math> antes del softmax) para preservar la estructura causal. * '''Dropout de atención''': Eliminar pesos de atención aleatoriamente durante el entrenamiento actúa como un regularizador y reduce el sobreajuste a patrones de alineación específicos. * '''Almacenamiento en caché de claves y valores''': Durante la inferencia, los vectores de clave y valor calculados previamente se almacenan en caché para evitar cómputos redundantes, acelerando significativamente la generación autorregresiva.
^h Chinese (zh)	* '''掩码''':在自回归解码中,未来位置会被掩码(在 softmax 之前设置为 <math>-\infty</math>)以保持因果结构。 * '''注意力 dropout''':在训练期间随机丢弃注意力权重起到正则化作用,减少对特定对齐模式的过拟合。 * '''键值缓存''':在推理期间,先前计算的键向量和值向量会被缓存,以避免冗余计算,显著加速自回归生成。