Translations:Attention Is All You Need/4/en

Prior to the transformer, dominant sequence transduction models relied on recurrent neural networks (RNNs), particularly LSTMs and GRUs, often enhanced with Lua error: Internal error: The interpreter exited with status 1. mechanisms. These architectures processed tokens sequentially, creating a fundamental bottleneck that prevented parallelization during training. The Lua error: Internal error: The interpreter exited with status 1. eliminated this constraint by relying solely on Lua error: Internal error: The interpreter exited with status 1. to draw global dependencies between input and output sequences, enabling far greater parallelism and reducing training times from days to hours on contemporary hardware.