All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	{{Term\|adamw}} has become the standard optimizer for a large fraction of contemporary {{Term\|deep learning}}, particularly for [[Transformer (machine learning model)\|transformers]] in language and vision. Mainstream frameworks ship native implementations (<code>torch.optim.AdamW</code> in PyTorch since 1.2, <code>tf.keras.optimizers.AdamW</code> in TensorFlow/Keras), and the optimizer is the default in popular training stacks such as Hugging Face {{Term\|transformer\|Transformers}} and timm. Practitioners typically tune {{Term\|adamw}} with a small weight-decay coefficient (often around 0.01 to 0.1) and a cosine or linear-{{Term\|learning rate warmup\|warmup}} learning-rate schedule, paralleling the AdamWR recipe.
^h Spanish (es)	{{Term\|adamw}} se ha convertido en el optimizador estándar para una gran parte del {{Term\|deep learning\|aprendizaje profundo}} contemporáneo, en particular para los [[Transformer (machine learning model)\|transformers]] en lenguaje y visión. Los frameworks principales incluyen implementaciones nativas (<code>torch.optim.AdamW</code> en PyTorch desde 1.2, <code>tf.keras.optimizers.AdamW</code> en TensorFlow/Keras), y el optimizador es el predeterminado en stacks de entrenamiento populares como Hugging Face {{Term\|transformer\|Transformers}} y timm. Quienes lo utilizan suelen ajustar {{Term\|adamw}} con un coeficiente de decaimiento de pesos pequeño (con frecuencia entre 0,01 y 0,1) y una programación de tasa de aprendizaje cosenoidal o con {{Term\|learning rate warmup\|calentamiento}} lineal, en paralelo con la receta AdamWR.
^h Chinese (zh)	{{Term\|adamw}} 已成为当代很大一部分{{Term\|deep learning\|深度学习}}的标准优化器，特别是在语言和视觉领域的 [[Transformer (machine learning model)\|Transformer]] 模型中。主流框架内置了原生实现（PyTorch 自 1.2 起提供 <code>torch.optim.AdamW</code>，TensorFlow/Keras 中提供 <code>tf.keras.optimizers.AdamW</code>），并且该优化器是 Hugging Face {{Term\|transformer\|Transformers}} 和 timm 等流行训练栈的默认选项。从业者通常使用较小的权重衰减系数（通常在 0.01 到 0.1 之间）以及余弦或线性{{Term\|learning rate warmup\|预热}}的学习率调度来调优 {{Term\|adamw}}，与 AdamWR 的配方相对应。