Translations:Cross-Entropy Loss/35/en
where $ \alpha $ is a small constant (commonly 0.1). This prevents the model from becoming overconfident, improves calibration, and often yields better generalization. It is standard practice in training large image classifiers and transformer models.