| Function |
Formula |
Range |
Notes
|
| Sigmoid |
$ \sigma(z) = \frac{1}{1+e^{-z}} $ |
(0, 1) |
Historically popular; suffers from vanishing gradients
|
| Tanh |
$ \tanh(z) = \frac{e^z - e^{-z}}{e^z + e^{-z}} $ |
(−1, 1) |
Zero-centred; still saturates for large inputs
|
| ReLU |
$ \max(0, z) $ |
[0, ∞) |
Default choice in modern networks; can cause "dead neurons"
|
| Leaky ReLU |
$ \max(\alpha z, z) $ for small $ \alpha > 0 $ |
(−∞, ∞) |
Addresses the dead-neuron problem
|
| Softmax |
$ \frac{e^{z_i}}{\sum_j e^{z_j}} $ |
(0, 1) |
Used in output layer for multi-class classification
|