<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Linear_Regression</id>
	<title>Linear Regression - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Linear_Regression"/>
	<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Linear_Regression&amp;action=history"/>
	<updated>2026-04-24T11:33:08Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>https://marovi.ai/index.php?title=Linear_Regression&amp;diff=2138&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (8c92aeb)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Linear_Regression&amp;diff=2138&amp;oldid=prev"/>
		<updated>2026-04-24T07:08:59Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (8c92aeb)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:08, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l116&quot;&gt;Line 116:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 116:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Statistics]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Statistics]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2101:rev-2138 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Linear_Regression&amp;diff=2101&amp;oldid=prev</id>
		<title>DeployBot: Pass 2 force re-parse</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Linear_Regression&amp;diff=2101&amp;oldid=prev"/>
		<updated>2026-04-24T07:00:49Z</updated>

		<summary type="html">&lt;p&gt;Pass 2 force re-parse&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:00, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l117&quot;&gt;Line 117:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 117:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2064:rev-2101 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Linear_Regression&amp;diff=2064&amp;oldid=prev</id>
		<title>DeployBot: Force re-parse after Math source-mode rollout (v1.2.0)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Linear_Regression&amp;diff=2064&amp;oldid=prev"/>
		<updated>2026-04-24T06:58:13Z</updated>

		<summary type="html">&lt;p&gt;Force re-parse after Math source-mode rollout (v1.2.0)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 06:58, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l116&quot;&gt;Line 116:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 116:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Statistics]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Statistics]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-1987:rev-2064 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Linear_Regression&amp;diff=1987&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (775ba6e)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Linear_Regression&amp;diff=1987&amp;oldid=prev"/>
		<updated>2026-04-24T04:01:43Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (775ba6e)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{LanguageBar | page = Linear Regression}}&lt;br /&gt;
{{ArticleInfobox | topic_area = Statistics | difficulty = Introductory | prerequisites = }}&lt;br /&gt;
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Linear regression&amp;#039;&amp;#039;&amp;#039; is a fundamental statistical method that models the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. It is one of the oldest and most widely used techniques in statistics and machine learning, serving as both a practical predictive tool and a building block for understanding more complex models.&lt;br /&gt;
&lt;br /&gt;
== Problem Setup ==&lt;br /&gt;
&lt;br /&gt;
Given a dataset of &amp;lt;math&amp;gt;N&amp;lt;/math&amp;gt; observations &amp;lt;math&amp;gt;\{(\mathbf{x}_i, y_i)\}_{i=1}^{N}&amp;lt;/math&amp;gt;, where &amp;lt;math&amp;gt;\mathbf{x}_i \in \mathbb{R}^d&amp;lt;/math&amp;gt; is a feature vector and &amp;lt;math&amp;gt;y_i \in \mathbb{R}&amp;lt;/math&amp;gt; is the target, linear regression assumes the relationship:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;y_i = \mathbf{w}^{\!\top} \mathbf{x}_i + b + \epsilon_i&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\mathbf{w} \in \mathbb{R}^d&amp;lt;/math&amp;gt; is the weight vector, &amp;lt;math&amp;gt;b&amp;lt;/math&amp;gt; is the bias (intercept), and &amp;lt;math&amp;gt;\epsilon_i&amp;lt;/math&amp;gt; is the error term. By absorbing the bias into the weight vector (appending a 1 to each &amp;lt;math&amp;gt;\mathbf{x}_i&amp;lt;/math&amp;gt;), this simplifies to &amp;lt;math&amp;gt;y_i = \mathbf{w}^{\!\top} \mathbf{x}_i + \epsilon_i&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
== Ordinary Least Squares ==&lt;br /&gt;
&lt;br /&gt;
The &amp;#039;&amp;#039;&amp;#039;ordinary least squares&amp;#039;&amp;#039;&amp;#039; (OLS) method finds the weights that minimize the sum of squared residuals:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\mathcal{L}(\mathbf{w}) = \sum_{i=1}^{N} (y_i - \mathbf{w}^{\!\top} \mathbf{x}_i)^2 = \|\mathbf{y} - X\mathbf{w}\|^2&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;X \in \mathbb{R}^{N \times d}&amp;lt;/math&amp;gt; is the design matrix and &amp;lt;math&amp;gt;\mathbf{y} \in \mathbb{R}^N&amp;lt;/math&amp;gt; is the target vector.&lt;br /&gt;
&lt;br /&gt;
=== Closed-Form Solution ===&lt;br /&gt;
&lt;br /&gt;
Setting the gradient to zero yields the &amp;#039;&amp;#039;&amp;#039;normal equations&amp;#039;&amp;#039;&amp;#039;:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\nabla_{\mathbf{w}} \mathcal{L} = -2 X^{\!\top}(\mathbf{y} - X\mathbf{w}) = 0&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\hat{\mathbf{w}} = (X^{\!\top} X)^{-1} X^{\!\top} \mathbf{y}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
This solution exists and is unique when &amp;lt;math&amp;gt;X^{\!\top} X&amp;lt;/math&amp;gt; is invertible (i.e., the features are linearly independent). The computational cost is &amp;lt;math&amp;gt;O(Nd^2 + d^3)&amp;lt;/math&amp;gt;, which is efficient for moderate &amp;lt;math&amp;gt;d&amp;lt;/math&amp;gt; but becomes expensive for high-dimensional problems.&lt;br /&gt;
&lt;br /&gt;
=== Gradient Descent Approach ===&lt;br /&gt;
&lt;br /&gt;
When the closed-form solution is impractical (large &amp;lt;math&amp;gt;d&amp;lt;/math&amp;gt; or &amp;lt;math&amp;gt;N&amp;lt;/math&amp;gt;), iterative optimization via [[Stochastic Gradient Descent|gradient descent]] is used. The gradient is:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\nabla_{\mathbf{w}} \mathcal{L} = -\frac{2}{N} X^{\!\top}(\mathbf{y} - X\mathbf{w})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The update rule is &amp;lt;math&amp;gt;\mathbf{w} \leftarrow \mathbf{w} - \eta \nabla_{\mathbf{w}} \mathcal{L}&amp;lt;/math&amp;gt;, where &amp;lt;math&amp;gt;\eta&amp;lt;/math&amp;gt; is the learning rate. Stochastic and mini-batch variants scale to millions of data points.&lt;br /&gt;
&lt;br /&gt;
== Assumptions of OLS ==&lt;br /&gt;
&lt;br /&gt;
The classical OLS estimator is &amp;#039;&amp;#039;&amp;#039;BLUE&amp;#039;&amp;#039;&amp;#039; (Best Linear Unbiased Estimator) under the Gauss-Markov conditions:&lt;br /&gt;
&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Linearity&amp;#039;&amp;#039;&amp;#039;: The true relationship between features and target is linear.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Independence&amp;#039;&amp;#039;&amp;#039;: Observations are independent of each other.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Homoscedasticity&amp;#039;&amp;#039;&amp;#039;: The error variance &amp;lt;math&amp;gt;\mathrm{Var}(\epsilon_i) = \sigma^2&amp;lt;/math&amp;gt; is constant across observations.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;No perfect multicollinearity&amp;#039;&amp;#039;&amp;#039;: No feature is an exact linear combination of others.&lt;br /&gt;
# &amp;#039;&amp;#039;&amp;#039;Exogeneity&amp;#039;&amp;#039;&amp;#039;: &amp;lt;math&amp;gt;E[\epsilon_i \mid \mathbf{x}_i] = 0&amp;lt;/math&amp;gt; — errors are uncorrelated with features.&lt;br /&gt;
&lt;br /&gt;
Violations of these assumptions do not necessarily make linear regression useless, but they may invalidate confidence intervals and hypothesis tests derived from the model.&lt;br /&gt;
&lt;br /&gt;
== Evaluation Metrics ==&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Metric !! Formula !! Interpretation&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;MSE&amp;#039;&amp;#039;&amp;#039; || &amp;lt;math&amp;gt;\frac{1}{N}\sum(y_i - \hat{y}_i)^2&amp;lt;/math&amp;gt; || Average squared error; penalises large errors&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;RMSE&amp;#039;&amp;#039;&amp;#039; || &amp;lt;math&amp;gt;\sqrt{\mathrm{MSE}}&amp;lt;/math&amp;gt; || In the same units as the target&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;MAE&amp;#039;&amp;#039;&amp;#039; || &amp;lt;math&amp;gt;\frac{1}{N}\sum|y_i - \hat{y}_i|&amp;lt;/math&amp;gt; || Average absolute error; robust to outliers&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;R-squared&amp;#039;&amp;#039;&amp;#039; || &amp;lt;math&amp;gt;1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}&amp;lt;/math&amp;gt; || Proportion of variance explained (0 to 1)&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
An &amp;lt;math&amp;gt;R^2&amp;lt;/math&amp;gt; of 1 indicates perfect prediction, while &amp;lt;math&amp;gt;R^2 = 0&amp;lt;/math&amp;gt; means the model does no better than predicting the mean. The &amp;#039;&amp;#039;&amp;#039;adjusted R-squared&amp;#039;&amp;#039;&amp;#039; penalises for the number of features, preventing artificial inflation from adding irrelevant predictors.&lt;br /&gt;
&lt;br /&gt;
== Multiple Regression ==&lt;br /&gt;
&lt;br /&gt;
When &amp;lt;math&amp;gt;d &amp;gt; 1&amp;lt;/math&amp;gt;, the model is called &amp;#039;&amp;#039;&amp;#039;multiple linear regression&amp;#039;&amp;#039;&amp;#039;. Each coefficient &amp;lt;math&amp;gt;w_j&amp;lt;/math&amp;gt; represents the expected change in &amp;lt;math&amp;gt;y&amp;lt;/math&amp;gt; per unit change in &amp;lt;math&amp;gt;x_j&amp;lt;/math&amp;gt;, holding all other features constant. Interpreting coefficients requires caution when features are correlated (multicollinearity), as individual coefficients may become unstable even though the overall model fits well.&lt;br /&gt;
&lt;br /&gt;
== Regularized Variants ==&lt;br /&gt;
&lt;br /&gt;
When the number of features is large relative to the number of observations, or when features are correlated, OLS can overfit. Regularization adds a penalty to the loss function:&lt;br /&gt;
&lt;br /&gt;
=== Ridge Regression (L2) ===&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\mathcal{L}_{\mathrm{ridge}} = \|\mathbf{y} - X\mathbf{w}\|^2 + \lambda \|\mathbf{w}\|_2^2&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The closed-form solution becomes &amp;lt;math&amp;gt;\hat{\mathbf{w}} = (X^{\!\top} X + \lambda I)^{-1} X^{\!\top} \mathbf{y}&amp;lt;/math&amp;gt;. Ridge shrinks coefficients toward zero but never sets them exactly to zero.&lt;br /&gt;
&lt;br /&gt;
=== Lasso Regression (L1) ===&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\mathcal{L}_{\mathrm{lasso}} = \|\mathbf{y} - X\mathbf{w}\|^2 + \lambda \|\mathbf{w}\|_1&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Lasso can drive coefficients to exactly zero, performing automatic &amp;#039;&amp;#039;&amp;#039;feature selection&amp;#039;&amp;#039;&amp;#039;. It has no closed-form solution and is typically solved via coordinate descent.&lt;br /&gt;
&lt;br /&gt;
=== Elastic Net ===&lt;br /&gt;
&lt;br /&gt;
Elastic Net combines both penalties: &amp;lt;math&amp;gt;\lambda_1 \|\mathbf{w}\|_1 + \lambda_2 \|\mathbf{w}\|_2^2&amp;lt;/math&amp;gt;, balancing sparsity and stability.&lt;br /&gt;
&lt;br /&gt;
== Practical Considerations ==&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Feature scaling&amp;#039;&amp;#039;&amp;#039;: Standardizing features (zero mean, unit variance) improves gradient descent convergence and makes regularization fair across features.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Polynomial features&amp;#039;&amp;#039;&amp;#039;: Adding polynomial terms (e.g., &amp;lt;math&amp;gt;x^2, x_1 x_2&amp;lt;/math&amp;gt;) allows linear regression to capture nonlinear relationships.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Outliers&amp;#039;&amp;#039;&amp;#039;: OLS is sensitive to outliers because of the squared loss. Robust alternatives include Huber regression and RANSAC.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Diagnostic plots&amp;#039;&amp;#039;&amp;#039;: Residual plots help detect violations of assumptions (non-linearity, heteroscedasticity, non-normality).&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[Stochastic Gradient Descent]]&lt;br /&gt;
* [[Logistic regression]]&lt;br /&gt;
* [[Loss Functions]]&lt;br /&gt;
* [[Overfitting and Regularization]]&lt;br /&gt;
* [[Neural Networks]]&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
* Hastie, T., Tibshirani, R. and Friedman, J. (2009). &amp;#039;&amp;#039;The Elements of Statistical Learning&amp;#039;&amp;#039;. Springer, Chapter 3.&lt;br /&gt;
* Montgomery, D. C., Peck, E. A. and Vining, G. G. (2012). &amp;#039;&amp;#039;Introduction to Linear Regression Analysis&amp;#039;&amp;#039;. Wiley.&lt;br /&gt;
* Hoerl, A. E. and Kennard, R. W. (1970). &amp;quot;Ridge Regression: Biased Estimation for Nonorthogonal Problems&amp;quot;. &amp;#039;&amp;#039;Technometrics&amp;#039;&amp;#039;.&lt;br /&gt;
* Tibshirani, R. (1996). &amp;quot;Regression Shrinkage and Selection via the Lasso&amp;quot;. &amp;#039;&amp;#039;Journal of the Royal Statistical Society, Series B&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
[[Category:Statistics]]&lt;br /&gt;
[[Category:Introductory]]&lt;/div&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
</feed>