Translations:Cross-Entropy Loss/29/en

where $m = \max_j z_j$ . Subtracting the maximum logit ensures the largest exponent is zero, preventing overflow. All major deep learning frameworks implement this fused operation (e.g., PyTorch's CrossEntropyLoss accepts raw logits).