All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	* '''Memory''' — the forward pass must store all intermediate {{Term\|activation function\|activations}} for the backward pass. For very deep networks this can be prohibitive; '''{{Term\|gradient checkpointing}}''' trades compute for memory by recomputing {{Term\|activation function\|activations}} during the backward pass instead of storing them. * '''Numerical stability''' — using log-sum-exp tricks and fused {{Term\|softmax}}-{{Term\|categorical cross-entropy\|cross-entropy}} implementations avoids overflow and underflow. * '''Higher-order gradients''' — differentiating through the backward pass itself yields second-order information (Hessian-{{Term\|vector}} products), useful for methods like natural {{Term\|gradient descent}} and {{Term\|meta-learning}}. * '''Mixed {{Term\|precision}}''' — computing the forward pass in {{Term\|fp16\|half precision}} while keeping a master copy of the weights in full {{Term\|precision}} speeds up training on modern GPUs.
^h Spanish (es)	* '''Memoria''' — el paso hacia adelante debe almacenar todas las {{Term\|activation function\|activaciones}} intermedias para el paso hacia atrás. En redes muy profundas esto puede ser prohibitivo; el '''{{Term\|gradient checkpointing\|checkpointing de gradientes}}''' intercambia cómputo por memoria al recomputar las {{Term\|activation function\|activaciones}} durante el paso hacia atrás en lugar de almacenarlas. * '''Estabilidad numérica''' — el uso de trucos log-sum-exp e implementaciones fusionadas de {{Term\|softmax\|softmax}}-{{Term\|categorical cross-entropy\|entropía cruzada}} evita el desbordamiento y el subdesbordamiento. * '''Gradientes de orden superior''' — diferenciar a través del propio paso hacia atrás produce información de segundo orden (productos Hessiano-{{Term\|vector\|vector}}), útil para métodos como el {{Term\|gradient descent\|descenso de gradiente}} natural y el {{Term\|meta-learning\|meta-aprendizaje}}. * '''{{Term\|precision\|Precisión}} mixta''' — calcular el paso hacia adelante en {{Term\|fp16\|precisión media}} mientras se mantiene una copia maestra de los pesos en {{Term\|precision\|precisión}} completa acelera el entrenamiento en las GPU modernas.
^h Chinese (zh)	* '''内存''' — 前向传播必须存储所有中间{{Term\|activation function\|激活}}以供反向传播使用。对于非常深的网络,这可能难以承受;'''{{Term\|gradient checkpointing\|梯度检查点}}'''通过在反向传播期间重新计算{{Term\|activation function\|激活}}而不是存储它们,以计算换取内存。 * '''数值稳定性''' — 使用 log-sum-exp 技巧和融合的 {{Term\|softmax\|softmax}}-{{Term\|categorical cross-entropy\|交叉熵}} 实现可以避免上溢和下溢。 * '''高阶梯度''' — 对反向传播本身进行微分会产生二阶信息(Hessian-{{Term\|vector\|向量}}积),对自然{{Term\|gradient descent\|梯度下降}}和{{Term\|meta-learning\|元学习}}等方法很有用。 * '''混合{{Term\|precision\|精度}}''' — 在{{Term\|fp16\|半精度}}下计算前向传播,同时以完整{{Term\|precision\|精度}}保留权重的主副本,可加速现代 GPU 上的训练。