<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Word_Embeddings</id>
	<title>Word Embeddings - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Word_Embeddings"/>
	<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Word_Embeddings&amp;action=history"/>
	<updated>2026-04-24T11:53:59Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=2146&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (8c92aeb)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=2146&amp;oldid=prev"/>
		<updated>2026-04-24T07:09:00Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (8c92aeb)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:09, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l119&quot;&gt;Line 119:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 119:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:NLP]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:NLP]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2118:rev-2146 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=2118&amp;oldid=prev</id>
		<title>DeployBot: Pass 2 force re-parse</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=2118&amp;oldid=prev"/>
		<updated>2026-04-24T07:01:24Z</updated>

		<summary type="html">&lt;p&gt;Pass 2 force re-parse&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:01, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l120&quot;&gt;Line 120:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 120:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2081:rev-2118 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=2081&amp;oldid=prev</id>
		<title>DeployBot: Force re-parse after Math source-mode rollout (v1.2.0)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=2081&amp;oldid=prev"/>
		<updated>2026-04-24T06:58:47Z</updated>

		<summary type="html">&lt;p&gt;Force re-parse after Math source-mode rollout (v1.2.0)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 06:58, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l119&quot;&gt;Line 119:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 119:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:NLP]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:NLP]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Intermediate]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-1995:rev-2081 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=1995&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (775ba6e)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Word_Embeddings&amp;diff=1995&amp;oldid=prev"/>
		<updated>2026-04-24T04:01:45Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (775ba6e)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{LanguageBar | page = Word Embeddings}}&lt;br /&gt;
{{ArticleInfobox | topic_area = NLP | difficulty = Intermediate | prerequisites = [[Neural Networks]]}}&lt;br /&gt;
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Word embeddings&amp;#039;&amp;#039;&amp;#039; are dense, low-dimensional vector representations of words in which semantically similar words are mapped to nearby points in the vector space. They are a foundational component of modern natural language processing (NLP), replacing sparse one-hot encodings with representations that capture meaning, analogy, and syntactic relationships.&lt;br /&gt;
&lt;br /&gt;
== The distributional hypothesis ==&lt;br /&gt;
&lt;br /&gt;
Word embeddings are grounded in the &amp;#039;&amp;#039;&amp;#039;distributional hypothesis&amp;#039;&amp;#039;&amp;#039;, famously stated by J. R. Firth (1957): &amp;quot;You shall know a word by the company it keeps.&amp;quot; The idea is that words appearing in similar contexts tend to have similar meanings. For example, &amp;quot;dog&amp;quot; and &amp;quot;cat&amp;quot; frequently appear near words like &amp;quot;pet&amp;quot;, &amp;quot;fur&amp;quot;, and &amp;quot;veterinarian&amp;quot;, so they should have similar representations.&lt;br /&gt;
&lt;br /&gt;
Early approaches to exploiting distributional information include co-occurrence matrices, pointwise mutual information (PMI), and latent semantic analysis (LSA). Modern word embedding methods learn dense vectors directly using neural networks.&lt;br /&gt;
&lt;br /&gt;
== One-hot vs dense representations ==&lt;br /&gt;
&lt;br /&gt;
=== One-hot encoding ===&lt;br /&gt;
&lt;br /&gt;
In a vocabulary of &amp;lt;math&amp;gt;V&amp;lt;/math&amp;gt; words, a one-hot vector for the &amp;lt;math&amp;gt;i&amp;lt;/math&amp;gt;-th word is a &amp;lt;math&amp;gt;V&amp;lt;/math&amp;gt;-dimensional vector with a 1 in position &amp;lt;math&amp;gt;i&amp;lt;/math&amp;gt; and 0 elsewhere. This representation has two critical shortcomings:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Dimensionality&amp;#039;&amp;#039;&amp;#039; — vectors are extremely high-dimensional (typically &amp;lt;math&amp;gt;V &amp;gt; 100{,}000&amp;lt;/math&amp;gt;).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;No similarity&amp;#039;&amp;#039;&amp;#039; — every pair of one-hot vectors is equally distant: &amp;lt;math&amp;gt;\mathbf{e}_i^\top \mathbf{e}_j = 0&amp;lt;/math&amp;gt; for &amp;lt;math&amp;gt;i \neq j&amp;lt;/math&amp;gt;. &amp;quot;Cat&amp;quot; is as far from &amp;quot;dog&amp;quot; as it is from &amp;quot;democracy.&amp;quot;&lt;br /&gt;
&lt;br /&gt;
=== Dense embeddings ===&lt;br /&gt;
&lt;br /&gt;
A word embedding maps each word to a real-valued vector of &amp;lt;math&amp;gt;d&amp;lt;/math&amp;gt; dimensions (typically &amp;lt;math&amp;gt;d = 100&amp;lt;/math&amp;gt;–&amp;lt;math&amp;gt;300&amp;lt;/math&amp;gt;):&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\mathbf{w}_i \in \mathbb{R}^d, \quad d \ll V&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Similar words have high cosine similarity:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\text{sim}(\mathbf{w}_a, \mathbf{w}_b) = \frac{\mathbf{w}_a \cdot \mathbf{w}_b}{\|\mathbf{w}_a\|\;\|\mathbf{w}_b\|}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
== Word2Vec ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;Word2Vec&amp;#039;&amp;#039;&amp;#039; (Mikolov et al., 2013) introduced two efficient architectures for learning word embeddings from large corpora.&lt;br /&gt;
&lt;br /&gt;
=== Continuous Bag of Words (CBOW) ===&lt;br /&gt;
&lt;br /&gt;
CBOW predicts a target word from its surrounding context words. Given a window of context words &amp;lt;math&amp;gt;\{w_{t-c}, \ldots, w_{t-1}, w_{t+1}, \ldots, w_{t+c}\}&amp;lt;/math&amp;gt;, the model maximises:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;P(w_t \mid w_{t-c}, \ldots, w_{t+c})&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
The context word vectors are averaged and passed through a softmax layer. CBOW is faster to train and works well for frequent words.&lt;br /&gt;
&lt;br /&gt;
=== Skip-gram ===&lt;br /&gt;
&lt;br /&gt;
Skip-gram reverses the prediction: given a target word, it predicts the surrounding context words. For each pair &amp;lt;math&amp;gt;(w_t, w_{t+j})&amp;lt;/math&amp;gt; where &amp;lt;math&amp;gt;j \in [-c, c] \setminus \{0\}&amp;lt;/math&amp;gt;, the model maximises:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;P(w_{t+j} \mid w_t) = \frac{\exp(\mathbf{v}&amp;#039;_{w_{t+j}}{}^\top \mathbf{v}_{w_t})}{\sum_{w=1}^{V}\exp(\mathbf{v}&amp;#039;_w{}^\top \mathbf{v}_{w_t})}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;\mathbf{v}_w&amp;lt;/math&amp;gt; and &amp;lt;math&amp;gt;\mathbf{v}&amp;#039;_w&amp;lt;/math&amp;gt; are the input and output embedding vectors. Computing the full softmax over the vocabulary is expensive, so two approximations are commonly used:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Negative sampling&amp;#039;&amp;#039;&amp;#039; — instead of computing the full softmax, the model contrasts the true context word against &amp;lt;math&amp;gt;k&amp;lt;/math&amp;gt; randomly sampled &amp;quot;negative&amp;quot; words.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Hierarchical softmax&amp;#039;&amp;#039;&amp;#039; — organises the vocabulary in a binary tree, reducing the softmax cost from &amp;lt;math&amp;gt;O(V)&amp;lt;/math&amp;gt; to &amp;lt;math&amp;gt;O(\log V)&amp;lt;/math&amp;gt;.&lt;br /&gt;
&lt;br /&gt;
Skip-gram performs well on rare words and captures subtle relationships. The famous analogy &amp;quot;king − man + woman ≈ queen&amp;quot; emerged from Skip-gram embeddings.&lt;br /&gt;
&lt;br /&gt;
== GloVe ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;GloVe&amp;#039;&amp;#039;&amp;#039; (Global Vectors, Pennington et al., 2014) combines the strengths of global matrix factorisation and local context window methods. It constructs a word co-occurrence matrix &amp;lt;math&amp;gt;X&amp;lt;/math&amp;gt; from the corpus, where &amp;lt;math&amp;gt;X_{ij}&amp;lt;/math&amp;gt; counts how often word &amp;lt;math&amp;gt;j&amp;lt;/math&amp;gt; appears in the context of word &amp;lt;math&amp;gt;i&amp;lt;/math&amp;gt;, and then optimises:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;J = \sum_{i,j=1}^{V} f(X_{ij})\bigl(\mathbf{w}_i^\top \tilde{\mathbf{w}}_j + b_i + \tilde{b}_j - \log X_{ij}\bigr)^2&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
where &amp;lt;math&amp;gt;f&amp;lt;/math&amp;gt; is a weighting function that caps the influence of very frequent co-occurrences. GloVe embeddings often match or exceed Word2Vec quality, and the explicit use of global statistics can improve performance on analogy tasks.&lt;br /&gt;
&lt;br /&gt;
== fastText ==&lt;br /&gt;
&lt;br /&gt;
&amp;#039;&amp;#039;&amp;#039;fastText&amp;#039;&amp;#039;&amp;#039; (Bojanowski et al., 2017) extends Word2Vec by representing each word as a bag of character n-grams. For example, the word &amp;quot;where&amp;quot; with &amp;lt;math&amp;gt;n = 3&amp;lt;/math&amp;gt; is represented by the n-grams {&amp;quot;&amp;lt;wh&amp;quot;, &amp;quot;whe&amp;quot;, &amp;quot;her&amp;quot;, &amp;quot;ere&amp;quot;, &amp;quot;re&amp;gt;&amp;quot;} plus the whole word &amp;quot;&amp;lt;where&amp;gt;&amp;quot;. The embedding for a word is the sum of its n-gram vectors.&lt;br /&gt;
&lt;br /&gt;
This approach has two key advantages:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Handling rare and unseen words&amp;#039;&amp;#039;&amp;#039; — even words not in the training vocabulary can receive embeddings by summing their character n-gram vectors.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Morphological awareness&amp;#039;&amp;#039;&amp;#039; — words sharing substrings (e.g. &amp;quot;teach&amp;quot;, &amp;quot;teacher&amp;quot;, &amp;quot;teaching&amp;quot;) automatically share embedding components.&lt;br /&gt;
&lt;br /&gt;
== Evaluation of embeddings ==&lt;br /&gt;
&lt;br /&gt;
Word embeddings are evaluated through:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Evaluation type !! Examples !! What it measures&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;Intrinsic: analogy&amp;#039;&amp;#039;&amp;#039; || &amp;quot;king : queen :: man : ?&amp;quot; || Linear structure of the space&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;Intrinsic: similarity&amp;#039;&amp;#039;&amp;#039; || Correlation with human similarity judgements (SimLex-999, WS-353) || Semantic quality&lt;br /&gt;
|-&lt;br /&gt;
| &amp;#039;&amp;#039;&amp;#039;Extrinsic: downstream&amp;#039;&amp;#039;&amp;#039; || Named entity recognition, sentiment analysis, parsing || Practical utility&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
Intrinsic evaluations are fast but do not always predict downstream performance. Extrinsic evaluation on the target task is ultimately the most reliable measure.&lt;br /&gt;
&lt;br /&gt;
== Contextual embeddings ==&lt;br /&gt;
&lt;br /&gt;
Traditional word embeddings assign a single vector per word regardless of context — the word &amp;quot;bank&amp;quot; has the same embedding whether it refers to a river bank or a financial institution. &amp;#039;&amp;#039;&amp;#039;Contextual embeddings&amp;#039;&amp;#039;&amp;#039; address this limitation by producing different representations depending on the surrounding text.&lt;br /&gt;
&lt;br /&gt;
Notable contextual embedding models include:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;ELMo&amp;#039;&amp;#039;&amp;#039; (Peters et al., 2018) — uses a bidirectional LSTM to generate context-dependent word representations.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;BERT&amp;#039;&amp;#039;&amp;#039; (Devlin et al., 2019) — uses a Transformer encoder trained with masked language modelling.&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;GPT&amp;#039;&amp;#039;&amp;#039; series (Radford et al., 2018–) — uses a Transformer decoder trained autoregressively.&lt;br /&gt;
&lt;br /&gt;
These models have largely superseded static embeddings for most NLP tasks, though static embeddings remain useful for efficiency, interpretability, and low-resource settings.&lt;br /&gt;
&lt;br /&gt;
== See also ==&lt;br /&gt;
&lt;br /&gt;
* [[Neural Networks]]&lt;br /&gt;
* [[Recurrent Neural Networks]]&lt;br /&gt;
* [[Loss Functions]]&lt;br /&gt;
* [[Backpropagation]]&lt;br /&gt;
&lt;br /&gt;
== References ==&lt;br /&gt;
&lt;br /&gt;
* Firth, J. R. (1957). &amp;quot;A synopsis of linguistic theory, 1930–1955&amp;quot;. In &amp;#039;&amp;#039;Studies in Linguistic Analysis&amp;#039;&amp;#039;.&lt;br /&gt;
* Mikolov, T. et al. (2013). &amp;quot;Efficient Estimation of Word Representations in Vector Space&amp;quot;. &amp;#039;&amp;#039;arXiv:1301.3781&amp;#039;&amp;#039;.&lt;br /&gt;
* Pennington, J., Socher, R. and Manning, C. D. (2014). &amp;quot;GloVe: Global Vectors for Word Representation&amp;quot;. &amp;#039;&amp;#039;EMNLP&amp;#039;&amp;#039;.&lt;br /&gt;
* Bojanowski, P. et al. (2017). &amp;quot;Enriching Word Vectors with Subword Information&amp;quot;. &amp;#039;&amp;#039;TACL&amp;#039;&amp;#039;, 5, 135–146.&lt;br /&gt;
* Peters, M. E. et al. (2018). &amp;quot;Deep contextualized word representations&amp;quot;. &amp;#039;&amp;#039;NAACL&amp;#039;&amp;#039;.&lt;br /&gt;
* Devlin, J. et al. (2019). &amp;quot;BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding&amp;quot;. &amp;#039;&amp;#039;NAACL&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
[[Category:NLP]]&lt;br /&gt;
[[Category:Intermediate]]&lt;/div&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
</feed>