Translations:Backpropagation/5/en

In a neural network the loss $$ L $$ depends on the output, which depends on the activations of the last hidden layer, which depend on the activations of the previous layer, and so on back to the input. The chain rule allows us to decompose the gradient into a product of local derivatives, one for each layer.