All translations

Enter a message name below to show all available translations.

Message

Found 3 translations.

NameCurrent message text
 h English (en)The network was trained using {{Term|stochastic gradient descent}} with a batch size of 128, {{Term|momentum}} of 0.9, and {{Term|weight decay}} of 0.0005. The {{Term|learning rate}} was initialized at 0.01 and manually reduced by a factor of 10 when the validation error stopped improving. Training took approximately five to six days on two NVIDIA GTX 580 GPUs.
 h Spanish (es)La red se entrenó utilizando {{Term|stochastic gradient descent|descenso de gradiente estocástico}} con un tamaño de lote de 128, {{Term|momentum|momento}} de 0.9 y {{Term|weight decay|decaimiento de pesos}} de 0.0005. La {{Term|learning rate|tasa de aprendizaje}} se inicializó en 0.01 y se redujo manualmente en un factor de 10 cuando el error de validación dejaba de mejorar. El entrenamiento tomó aproximadamente cinco a seis días en dos GPUs NVIDIA GTX 580.
 h Chinese (zh)网络使用{{Term|stochastic gradient descent|随机梯度下降}}进行训练,批量大小为 128,{{Term|momentum|动量}}为 0.9,{{Term|weight decay|权重衰减}}为 0.0005。{{Term|learning rate|学习率}}初始化为 0.01,并在验证误差停止下降时手动降低 10 倍。训练在两块 NVIDIA GTX 580 GPU 上进行了大约五到六天。