Translations:Adam A Method for Stochastic Optimization/23/en

logistic regression on MNIST: Adam converged faster than SGD with momentum, adagrad, and RMSProp.
Multi-layer neural networks on MNIST: Adam achieved the lowest training cost, with convergence speed comparable to or better than competing methods.
Convolutional neural networks on CIFAR-10: Adam performed comparably to SGD with carefully tuned momentum and learning rate schedules.
Variational autoencoders (VAEs): Adam was used successfully to optimize the variational lower bound, demonstrating its applicability to generative models.