Translations:Neural Networks/14/en
where $ g $ and $ f $ are activation functions, $ \mathbf{W}_1, \mathbf{W}_2 $ are weight matrices, and $ \mathbf{b}_1, \mathbf{b}_2 $ are bias vectors. The hidden layer enables the network to learn nonlinear relationships that a single perceptron cannot capture.