Translations:Neural Networks/18/en

Function	Formula	Range	Notes
Sigmoid	$\sigma(z) = \frac{1}{1+e^{-z}}$	(0, 1)	Historically popular; suffers from vanishing gradients
Tanh	$\tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}}$	(−1, 1)	Zero-centred; still saturates for large inputs
ReLU	$\max(0, z)$	[0, ∞)	Default choice in modern networks; can cause "dead neurons"
Leaky ReLU	$\max(\alpha z, z)$ for small $\alpha > 0$	(−∞, ∞)	Addresses the dead-neuron problem
softmax	$\frac{e^{z_i}}{\sum_j e^{z_j}}$	(0, 1)	Used in output layer for multi-class classification