Translations:Softmax Function/30/en

A neural network produces raw logits $\mathbf{z}$ from its final linear layer.
Softmax converts logits to probabilities: $\hat{\mathbf{y}} = \sigma(\mathbf{z})$ .
The predicted class is $\hat{c} = \arg\max_k \hat{y}_k$ .
Training uses Cross-Entropy Loss applied to the predicted distribution and the true labels.