A naive implementation of softmax can overflow when logits are large (e.g., $ e^{1000} $ is infinite in floating point). The standard fix subtracts the maximum logit: