All translations

Enter a message name below to show all available translations.

Found 3 translations.

Name	Current message text
^h English (en)	The network was trained using {{Term\|stochastic gradient descent}} with a batch size of 128, {{Term\|momentum}} of 0.9, and {{Term\|weight decay}} of 0.0005. The {{Term\|learning rate}} was initialized at 0.01 and manually reduced by a factor of 10 when the validation error stopped improving. Training took approximately five to six days on two NVIDIA GTX 580 GPUs.
^h Spanish (es)	La red se entrenó utilizando {{Term\|stochastic gradient descent\|descenso de gradiente estocástico}} con un tamaño de lote de 128, {{Term\|momentum\|momento}} de 0.9 y {{Term\|weight decay\|decaimiento de pesos}} de 0.0005. La {{Term\|learning rate\|tasa de aprendizaje}} se inicializó en 0.01 y se redujo manualmente en un factor de 10 cuando el error de validación dejaba de mejorar. El entrenamiento tomó aproximadamente cinco a seis días en dos GPUs NVIDIA GTX 580.
^h Chinese (zh)	网络使用{{Term\|stochastic gradient descent\|随机梯度下降}}进行训练，批量大小为 128，{{Term\|momentum\|动量}}为 0.9，{{Term\|weight decay\|权重衰减}}为 0.0005。{{Term\|learning rate\|学习率}}初始化为 0.01，并在验证误差停止下降时手动降低 10 倍。训练在两块 NVIDIA GTX 580 GPU 上进行了大约五到六天。