Translations:Backpropagation/10/en

where $\mathbf{a}^{(l-1)}$ is the activation from the previous layer (with $\mathbf{a}^{(0)} = \mathbf{x}$ ), $\mathbf{W}^{(l)}$ and $\mathbf{b}^{(l)}$ are the weights and biases, and $g^{(l)}$ is the activation function. The forward pass stores all intermediate values $\mathbf{z}^{(l)}$ and $\mathbf{a}^{(l)}$ because they are needed during the backward pass.