Translations:Overfitting and Regularization/14/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
The gradient of the regularization term is <math>\lambda \theta</math>, so each weight is multiplicatively shrunk toward zero at every update — hence the name '''weight decay'''. The hyperparameter <math>\lambda</math> controls the regularization strength. | The gradient of the regularization term is <math>\lambda \theta</math>, so each weight is multiplicatively shrunk toward zero at every update — hence the name '''{{Term|weight decay}}'''. The {{Term|hyperparameter}} <math>\lambda</math> controls the regularization strength. | ||
Revision as of 19:42, 27 April 2026
The gradient of the regularization term is $ \lambda \theta $, so each weight is multiplicatively shrunk toward zero at every update — hence the name weight decay. The hyperparameter $ \lambda $ controls the regularization strength.