<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Overfitting_and_Regularization</id>
	<title>Overfitting and Regularization - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Overfitting_and_Regularization"/>
	<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;action=history"/>
	<updated>2026-04-24T11:32:23Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=2141&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (8c92aeb)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=2141&amp;oldid=prev"/>
		<updated>2026-04-24T07:08:59Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (8c92aeb)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:08, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l119&quot;&gt;Line 119:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 119:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2108:rev-2141 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=2108&amp;oldid=prev</id>
		<title>DeployBot: Pass 2 force re-parse</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=2108&amp;oldid=prev"/>
		<updated>2026-04-24T07:01:07Z</updated>

		<summary type="html">&lt;p&gt;Pass 2 force re-parse&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:01, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l120&quot;&gt;Line 120:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 120:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2071:rev-2108 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=2071&amp;oldid=prev</id>
		<title>DeployBot: Force re-parse after Math source-mode rollout (v1.2.0)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=2071&amp;oldid=prev"/>
		<updated>2026-04-24T06:58:30Z</updated>

		<summary type="html">&lt;p&gt;Force re-parse after Math source-mode rollout (v1.2.0)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 06:58, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l119&quot;&gt;Line 119:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 119:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-1990:rev-2071 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=1990&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (775ba6e)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Overfitting_and_Regularization&amp;diff=1990&amp;oldid=prev"/>
		<updated>2026-04-24T04:01:44Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (775ba6e)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{LanguageBar | page = Overfitting and Regularization}}&lt;br /&gt;
{{ArticleInfobox | topic_area = Machine Learning | difficulty = Intermediate | prerequisites = [[Loss Functions]], [[Neural Networks]]}}&lt;br /&gt;
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Overfitting&amp;#039;&amp;#039;&amp;#039; occurs when a machine-learning model learns the training data too well — capturing noise and idiosyncrasies rather than the underlying pattern — and consequently performs poorly on unseen data. &amp;#039;&amp;#039;&amp;#039;Regularization&amp;#039;&amp;#039;&amp;#039; is the family of techniques used to prevent overfitting and improve a model&amp;#039;s ability to generalise.&lt;br /&gt;
&lt;br /&gt;
== The bias–variance tradeoff ==&lt;br /&gt;
&lt;br /&gt;
Prediction error on unseen data can be decomposed into three components:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\text{Error} = \text{Bias}^2 + \text{Variance} + \text{Irreducible noise}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Bias&amp;#039;&amp;#039;&amp;#039; measures how far the model&amp;#039;s average prediction is from the true value. High bias indicates the model is too simple to capture the data&amp;#039;s structure (&amp;#039;&amp;#039;&amp;#039;underfitting&amp;#039;&amp;#039;&amp;#039;).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Variance&amp;#039;&amp;#039;&amp;#039; measures how much predictions fluctuate across different training sets. High variance indicates the model is too sensitive to the particular training data (&amp;#039;&amp;#039;&amp;#039;overfitting&amp;#039;&amp;#039;&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
The goal is to find the sweet spot that minimises total error. A model with too few parameters underfits (high bias); a model with too many parameters overfits (high variance). Regularization techniques tilt the balance by constraining model complexity, accepting slightly higher bias in exchange for substantially lower variance.&lt;br /&gt;
&lt;br /&gt;
== Detecting overfitting ==&lt;br /&gt;
&lt;br /&gt;
The clearest diagnostic is to compare training and validation performance:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Training loss decreasing, validation loss also decreasing&amp;#039;&amp;#039;&amp;#039; — the model is still learning; continue training.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Training loss decreasing, validation loss increasing&amp;#039;&amp;#039;&amp;#039; — the model is overfitting; apply regularization or stop training.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Training loss high, validation loss high&amp;#039;&amp;#039;&amp;#039; — the model is underfitting; increase capacity or train longer.&lt;br /&gt;
&lt;br /&gt;
Plotting these &amp;#039;&amp;#039;&amp;#039;learning curves&amp;#039;&amp;#039;&amp;#039; over training iterations is essential practice. A large gap between training accuracy and validation accuracy is the hallmark of overfitting.&lt;br /&gt;
&lt;br /&gt;
== L2 regularization (weight decay) ==&lt;br /&gt;
&lt;br /&gt;
L2 regularization adds a penalty proportional to the squared magnitude of the weights:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;J(\theta) = L(\theta) + \frac{\lambda}{2}\|\theta\|_2^2 = L(\theta) + \frac{\lambda}{2}\sum_j \theta_j^2&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The gradient of the regularization term is &amp;lt;math&amp;gt;\lambda \theta&amp;lt;/math&amp;gt;, so each weight is multiplicatively shrunk toward zero at every update — hence the name &amp;#039;&amp;#039;&amp;#039;weight decay&amp;#039;&amp;#039;&amp;#039;. The hyperparameter &amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt; controls the regularization strength.&lt;br /&gt;
&lt;br /&gt;
L2 regularization is equivalent to placing a Gaussian prior on the weights from a Bayesian perspective. It encourages small, distributed weights and discourages any single weight from becoming excessively large.&lt;br /&gt;
&lt;br /&gt;
== L1 regularization ==&lt;br /&gt;
&lt;br /&gt;
L1 regularization penalises the sum of absolute values:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;J(\theta) = L(\theta) + \lambda \|\theta\|_1 = L(\theta) + \lambda \sum_j |\theta_j|&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Unlike L2, the L1 penalty drives many weights exactly to zero, producing &amp;#039;&amp;#039;&amp;#039;sparse&amp;#039;&amp;#039;&amp;#039; models. This makes L1 regularization useful for feature selection. The LASSO (Least Absolute Shrinkage and Selection Operator) is the classic example of L1-regularized linear regression.&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Property !! L1 !! L2&lt;br /&gt;
|-&lt;br /&gt;
| Penalty || &amp;lt;math&amp;gt;\lambda\sum|\theta_j|&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;\frac{\lambda}{2}\sum\theta_j^2&amp;lt;/math&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| Effect on weights || Drives many to exactly zero || Shrinks all toward zero&lt;br /&gt;
|-&lt;br /&gt;
| Sparsity || Yes || No&lt;br /&gt;
|-&lt;br /&gt;
| Bayesian interpretation || Laplace prior || Gaussian prior&lt;br /&gt;
|-&lt;br /&gt;
| Use case || Feature selection, interpretability || General regularization&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
== Dropout ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Dropout&amp;#039;&amp;#039;&amp;#039; (Srivastava et al., 2014) is a regularization technique specific to neural networks. During training, each neuron is randomly &amp;quot;dropped&amp;quot; (set to zero) with probability &amp;lt;math&amp;gt;p&amp;lt;/math&amp;gt; at each forward pass. This prevents neurons from co-adapting and forces the network to learn redundant representations.&lt;br /&gt;
&lt;br /&gt;
At test time, all neurons are active but their outputs are scaled by &amp;lt;math&amp;gt;(1 - p)&amp;lt;/math&amp;gt; to compensate for the larger number of active units (or equivalently, outputs are scaled by &amp;lt;math&amp;gt;1/(1-p)&amp;lt;/math&amp;gt; during training — &amp;#039;&amp;#039;&amp;#039;inverted dropout&amp;#039;&amp;#039;&amp;#039;).&lt;br /&gt;
&lt;br /&gt;
Dropout can be interpreted as an approximate ensemble method: each training step uses a different subnetwork, and the final model approximates the average prediction of exponentially many subnetworks.&lt;br /&gt;
&lt;br /&gt;
== Early stopping ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Early stopping&amp;#039;&amp;#039;&amp;#039; monitors the validation loss during training and halts optimisation when the validation loss stops improving. This is one of the simplest and most effective regularization strategies.&lt;br /&gt;
&lt;br /&gt;
In practice, a &amp;#039;&amp;#039;&amp;#039;patience&amp;#039;&amp;#039;&amp;#039; parameter specifies how many epochs to wait after the last improvement before stopping. The model weights are saved at the point of lowest validation loss and restored at the end.&lt;br /&gt;
&lt;br /&gt;
Early stopping acts as an implicit form of regularization: it limits the effective number of training steps, preventing the model from fully memorising the training data.&lt;br /&gt;
&lt;br /&gt;
== Data augmentation ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Data augmentation&amp;#039;&amp;#039;&amp;#039; increases the effective size and diversity of the training set by applying label-preserving transformations. For image data, common augmentations include:&lt;br /&gt;
&lt;br /&gt;
* Random horizontal/vertical flips&lt;br /&gt;
* Random crops and resizing&lt;br /&gt;
* Colour jittering (brightness, contrast, saturation)&lt;br /&gt;
* Rotation and affine transformations&lt;br /&gt;
* Mixup (linear interpolation of pairs of images and their labels)&lt;br /&gt;
* Cutout (masking random patches)&lt;br /&gt;
&lt;br /&gt;
For text data, augmentations include synonym replacement, back-translation, and paraphrasing. Data augmentation reduces overfitting by exposing the model to more varied inputs without collecting additional data.&lt;br /&gt;
&lt;br /&gt;
== Other regularization techniques ==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Batch normalization&amp;#039;&amp;#039;&amp;#039; — normalising layer inputs reduces internal covariate shift and has a mild regularizing effect.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Label smoothing&amp;#039;&amp;#039;&amp;#039; — replaces one-hot targets with a mixture, e.g. &amp;lt;math&amp;gt;y_{\text{smooth}} = (1 - \epsilon)\, y + \epsilon / C&amp;lt;/math&amp;gt;, preventing overconfidence.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Noise injection&amp;#039;&amp;#039;&amp;#039; — adding Gaussian noise to inputs, weights, or gradients during training.&lt;br /&gt;
&lt;br /&gt;
== Practical guidelines ==&lt;br /&gt;
&lt;br /&gt;
# Start with a model large enough to overfit the training data — this confirms the model has sufficient capacity.&lt;br /&gt;
# Add regularization incrementally (dropout, weight decay, augmentation) and monitor validation performance.&lt;br /&gt;
# Use early stopping as a safety net.&lt;br /&gt;
# Prefer more training data over stronger regularization whenever possible — regularization is a substitute for data, not a replacement.&lt;br /&gt;
# Tune the regularization strength (&amp;lt;math&amp;gt;\lambda&amp;lt;/math&amp;gt;, dropout rate) using a validation set, never the test set.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[Loss Functions]]&lt;br /&gt;
* [[Neural Networks]]&lt;br /&gt;
* [[Gradient Descent]]&lt;br /&gt;
* [[Convolutional Neural Networks]]&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
* Srivastava, N. et al. (2014). &amp;quot;Dropout: A Simple Way to Prevent Neural Networks from Overfitting&amp;quot;. &amp;#039;&amp;#039;JMLR&amp;#039;&amp;#039;, 15, 1929–1958.&lt;br /&gt;
* Tibshirani, R. (1996). &amp;quot;Regression Shrinkage and Selection via the Lasso&amp;quot;. &amp;#039;&amp;#039;JRSS Series B&amp;#039;&amp;#039;, 58(1), 267–288.&lt;br /&gt;
* Goodfellow, I., Bengio, Y. and Courville, A. (2016). &amp;#039;&amp;#039;Deep Learning&amp;#039;&amp;#039;, Chapter 7. MIT Press.&lt;br /&gt;
* Zhang, C. et al. (2017). &amp;quot;Understanding deep learning requires rethinking generalization&amp;quot;. &amp;#039;&amp;#039;ICLR&amp;#039;&amp;#039;.&lt;br /&gt;
* Shorten, C. and Khoshgoftaar, T. M. (2019). &amp;quot;A survey on Image Data Augmentation for Deep Learning&amp;quot;. &amp;#039;&amp;#039;Journal of Big Data&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
[[Category:Machine Learning]]&lt;br /&gt;
[[Category:Intermediate]]&lt;/div&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
</feed>