Translations:Batch Normalization Accelerating Deep Network Training/20/en: Difference between revisions
(Importing a new version from external source) |
(Importing a new version from external source) |
||
| Line 1: | Line 1: | ||
* A batch-normalized network matched the accuracy of the original Inception model in only '''7% of the training steps''' (14x acceleration). | * A batch-normalized network matched the accuracy of the original Inception model in only '''7% of the training steps''' (14x acceleration). | ||
* '''BN-Inception''' (with batch normalization and other modifications) achieved a top-5 validation error of 4.82%, exceeding the accuracy of the original GoogLeNet (6.67%) and approaching human performance. | * '''BN-Inception''' (with {{Term|batch normalization}} and other modifications) achieved a top-5 validation error of 4.82%, exceeding the accuracy of the original GoogLeNet (6.67%) and approaching human performance. | ||
* Using batch normalization allowed training with a learning rate 10x higher than the baseline without divergence. | * Using {{Term|batch normalization}} allowed training with a {{Term|learning rate}} 10x higher than the baseline without divergence. | ||
* On some configurations, batch normalization eliminated the need for dropout without accuracy loss, simplifying the architecture and reducing training time further. | * On some configurations, {{Term|batch normalization}} eliminated the need for {{Term|dropout}} without accuracy loss, simplifying the architecture and reducing training time further. | ||
Latest revision as of 21:40, 27 April 2026
- A batch-normalized network matched the accuracy of the original Inception model in only 7% of the training steps (14x acceleration).
- BN-Inception (with batch normalization and other modifications) achieved a top-5 validation error of 4.82%, exceeding the accuracy of the original GoogLeNet (6.67%) and approaching human performance.
- Using batch normalization allowed training with a learning rate 10x higher than the baseline without divergence.
- On some configurations, batch normalization eliminated the need for dropout without accuracy loss, simplifying the architecture and reducing training time further.