The activation function introduces nonlinearity; without it, a multi-layer network would collapse to a single linear transformation. Common choices include: