Translations:Backpropagation/10/en
where $ \mathbf{a}^{(l-1)} $ is the activation from the previous layer (with $ \mathbf{a}^{(0)} = \mathbf{x} $), $ \mathbf{W}^{(l)} $ and $ \mathbf{b}^{(l)} $ are the weights and biases, and $ g^{(l)} $ is the activation function. The forward pass stores all intermediate values $ \mathbf{z}^{(l)} $ and $ \mathbf{a}^{(l)} $ because they are needed during the backward pass.