<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en">
	<id>https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Softmax_Function%2Fes</id>
	<title>Softmax Function/es - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://marovi.ai/index.php?action=history&amp;feed=atom&amp;title=Softmax_Function%2Fes"/>
	<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Softmax_Function/es&amp;action=history"/>
	<updated>2026-04-24T12:42:25Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.39.1</generator>
	<entry>
		<id>https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2159&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (8c92aeb)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2159&amp;oldid=prev"/>
		<updated>2026-04-24T07:09:01Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (8c92aeb)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:09, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l103&quot;&gt;Line 103:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 103:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;−&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #ffe49c; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;del style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/del&gt;&lt;/div&gt;&lt;/td&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-added&quot;&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2113:rev-2159 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2113&amp;oldid=prev</id>
		<title>DeployBot: Pass 2 force re-parse</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2113&amp;oldid=prev"/>
		<updated>2026-04-24T07:01:17Z</updated>

		<summary type="html">&lt;p&gt;Pass 2 force re-parse&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 07:01, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l104&quot;&gt;Line 104:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 104:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&amp;lt;!--v1.2.0 cache-bust--&amp;gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!-- pass 2 --&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2076:rev-2113 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2076&amp;oldid=prev</id>
		<title>DeployBot: Force re-parse after Math source-mode rollout (v1.2.0)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2076&amp;oldid=prev"/>
		<updated>2026-04-24T06:58:40Z</updated>

		<summary type="html">&lt;p&gt;Force re-parse after Math source-mode rollout (v1.2.0)&lt;/p&gt;
&lt;table style=&quot;background-color: #fff; color: #202122;&quot; data-mw=&quot;interface&quot;&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;col class=&quot;diff-marker&quot; /&gt;
				&lt;col class=&quot;diff-content&quot; /&gt;
				&lt;tr class=&quot;diff-title&quot; lang=&quot;en&quot;&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;← Older revision&lt;/td&gt;
				&lt;td colspan=&quot;2&quot; style=&quot;background-color: #fff; color: #202122; text-align: center;&quot;&gt;Revision as of 06:58, 24 April 2026&lt;/td&gt;
				&lt;/tr&gt;&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot; id=&quot;mw-diff-left-l103&quot;&gt;Line 103:&lt;/td&gt;
&lt;td colspan=&quot;2&quot; class=&quot;diff-lineno&quot;&gt;Line 103:&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Machine Learning]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot;&gt;&lt;/td&gt;&lt;td style=&quot;background-color: #f8f9fa; color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #eaecf0; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;[[Category:Introductory]]&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td colspan=&quot;2&quot; class=&quot;diff-side-deleted&quot;&gt;&lt;/td&gt;&lt;td class=&quot;diff-marker&quot; data-marker=&quot;+&quot;&gt;&lt;/td&gt;&lt;td style=&quot;color: #202122; font-size: 88%; border-style: solid; border-width: 1px 1px 1px 4px; border-radius: 0.33em; border-color: #a3d3ff; vertical-align: top; white-space: pre-wrap;&quot;&gt;&lt;div&gt;&lt;ins style=&quot;font-weight: bold; text-decoration: none;&quot;&gt;&amp;lt;!--v1.2.0 cache-bust--&gt;&lt;/ins&gt;&lt;/div&gt;&lt;/td&gt;&lt;/tr&gt;

&lt;!-- diff cache key mediawiki:diff::1.12:old-2008:rev-2076 --&gt;
&lt;/table&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
	<entry>
		<id>https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2008&amp;oldid=prev</id>
		<title>DeployBot: [deploy-bot] Deploy from CI (775ba6e)</title>
		<link rel="alternate" type="text/html" href="https://marovi.ai/index.php?title=Softmax_Function/es&amp;diff=2008&amp;oldid=prev"/>
		<updated>2026-04-24T04:01:50Z</updated>

		<summary type="html">&lt;p&gt;[deploy-bot] Deploy from CI (775ba6e)&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;{{LanguageBar | page = Softmax Function}}&lt;br /&gt;
{{ArticleInfobox | topic_area = Machine Learning | difficulty = Introductory | prerequisites = }}&lt;br /&gt;
{{ContentMeta | generated_by = claude-opus | model_used = claude-opus-4-6 | generated_date = 2026-03-13}}&lt;br /&gt;
&lt;br /&gt;
La &amp;#039;&amp;#039;&amp;#039;funcion softmax&amp;#039;&amp;#039;&amp;#039; (tambien llamada &amp;#039;&amp;#039;&amp;#039;funcion exponencial normalizada&amp;#039;&amp;#039;&amp;#039;) es una funcion matematica que convierte un vector de numeros reales (&amp;#039;&amp;#039;&amp;#039;logits&amp;#039;&amp;#039;&amp;#039;) en una distribucion de probabilidad. Es la activacion de salida estandar para la clasificacion multiclase en redes neuronales y desempena un papel central en modelos que van desde la regresion logistica hasta los grandes modelos de lenguaje.&lt;br /&gt;
&lt;br /&gt;
== Definicion ==&lt;br /&gt;
&lt;br /&gt;
Dado un vector de logits &amp;lt;math&amp;gt;\mathbf{z} = (z_1, z_2, \dots, z_K)&amp;lt;/math&amp;gt; para &amp;lt;math&amp;gt;K&amp;lt;/math&amp;gt; clases, la funcion softmax produce:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\sigma(\mathbf{z})_k = \frac{e^{z_k}}{\sum_{j=1}^{K} e^{z_j}}, \qquad k = 1, \dots, K&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
La salida satisface dos propiedades que la convierten en una distribucion de probabilidad valida:&lt;br /&gt;
&lt;br /&gt;
# &amp;lt;math&amp;gt;\sigma(\mathbf{z})_k &amp;gt; 0&amp;lt;/math&amp;gt; para todo &amp;lt;math&amp;gt;k&amp;lt;/math&amp;gt; (dado que la funcion exponencial es siempre positiva).&lt;br /&gt;
# &amp;lt;math&amp;gt;\sum_{k=1}^{K} \sigma(\mathbf{z})_k = 1&amp;lt;/math&amp;gt; (por construccion).&lt;br /&gt;
&lt;br /&gt;
== Intuicion ==&lt;br /&gt;
&lt;br /&gt;
La funcion softmax amplifica las diferencias entre los logits. Un logit mayor que los demas recibe una proporcion desproporcionadamente grande de la masa de probabilidad porque la funcion exponencial crece de forma superlineal. Por ejemplo:&lt;br /&gt;
&lt;br /&gt;
{| class=&amp;quot;wikitable&amp;quot;&lt;br /&gt;
|-&lt;br /&gt;
! Logits !! Salida softmax&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;math&amp;gt;(2.0,\; 1.0,\; 0.1)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(0.659,\; 0.242,\; 0.099)&amp;lt;/math&amp;gt;&lt;br /&gt;
|-&lt;br /&gt;
| &amp;lt;math&amp;gt;(5.0,\; 1.0,\; 0.1)&amp;lt;/math&amp;gt; || &amp;lt;math&amp;gt;(0.993,\; 0.005,\; 0.002)&amp;lt;/math&amp;gt;&lt;br /&gt;
|}&lt;br /&gt;
&lt;br /&gt;
A medida que la brecha entre el logit mas grande y los demas aumenta, la salida se aproxima a un vector one-hot. Este comportamiento de &amp;quot;el ganador se lleva la mayor parte&amp;quot; hace que softmax sea adecuada para la clasificacion donde una unica clase debe dominar.&lt;br /&gt;
&lt;br /&gt;
== Parametro de temperatura ==&lt;br /&gt;
&lt;br /&gt;
Un parametro de &amp;#039;&amp;#039;&amp;#039;temperatura&amp;#039;&amp;#039;&amp;#039; &amp;lt;math&amp;gt;T &amp;gt; 0&amp;lt;/math&amp;gt; controla la nitidez de la distribucion:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\sigma(\mathbf{z}; T)_k = \frac{e^{z_k / T}}{\sum_{j=1}^{K} e^{z_j / T}}&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
* &amp;lt;math&amp;gt;T \to 0&amp;lt;/math&amp;gt;: La distribucion colapsa en un vector one-hot seleccionando el argmax — equivalente a una decision rigida.&lt;br /&gt;
* &amp;lt;math&amp;gt;T = 1&amp;lt;/math&amp;gt;: Softmax estandar.&lt;br /&gt;
* &amp;lt;math&amp;gt;T \to \infty&amp;lt;/math&amp;gt;: La distribucion se aproxima a la uniforme — todas las clases se vuelven igualmente probables.&lt;br /&gt;
&lt;br /&gt;
El escalado por temperatura se utiliza ampliamente en la destilacion de conocimiento (Hinton et al., 2015), donde una distribucion &amp;quot;suave&amp;quot; de un modelo maestro proporciona una senal de entrenamiento mas rica que las etiquetas rigidas. Tambien se utiliza para controlar la aleatoriedad en la generacion de texto a partir de modelos de lenguaje.&lt;br /&gt;
&lt;br /&gt;
== Estabilidad numerica ==&lt;br /&gt;
&lt;br /&gt;
Una implementacion ingenua de softmax puede desbordarse cuando los logits son grandes (por ejemplo, &amp;lt;math&amp;gt;e^{1000}&amp;lt;/math&amp;gt; es infinito en punto flotante). La solucion estandar es restar el logit maximo:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\sigma(\mathbf{z})_k = \frac{e^{z_k - m}}{\sum_{j=1}^{K} e^{z_j - m}}, \qquad m = \max_j z_j&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Esto es matematicamente equivalente (la constante se cancela) pero asegura que el exponente mas grande sea &amp;lt;math&amp;gt;e^0 = 1&amp;lt;/math&amp;gt;, previniendo el desbordamiento. Todos los principales frameworks de aprendizaje profundo implementan esta version estabilizada automaticamente.&lt;br /&gt;
&lt;br /&gt;
== Relacion con la sigmoide ==&lt;br /&gt;
&lt;br /&gt;
Para el caso especial de &amp;lt;math&amp;gt;K = 2&amp;lt;/math&amp;gt; clases, la funcion softmax se reduce a la funcion &amp;#039;&amp;#039;&amp;#039;sigmoide&amp;#039;&amp;#039;&amp;#039; (logistica). Si se define &amp;lt;math&amp;gt;z = z_1 - z_2&amp;lt;/math&amp;gt;, entonces:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\sigma(\mathbf{z})_1 = \frac{e^{z_1}}{e^{z_1} + e^{z_2}} = \frac{1}{1 + e^{-z}} = \sigma_{\mathrm{sigmoid}}(z)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
Por esto, los clasificadores binarios tipicamente utilizan una unica neurona de salida con activacion sigmoide en lugar de dos neuronas con softmax — son matematicamente equivalentes.&lt;br /&gt;
&lt;br /&gt;
== Gradiente ==&lt;br /&gt;
&lt;br /&gt;
El jacobiano de la funcion softmax con respecto a su entrada es:&lt;br /&gt;
&lt;br /&gt;
:&amp;lt;math&amp;gt;\frac{\partial \sigma_k}{\partial z_j} = \sigma_k (\delta_{kj} - \sigma_j)&amp;lt;/math&amp;gt;&lt;br /&gt;
&lt;br /&gt;
donde &amp;lt;math&amp;gt;\delta_{kj}&amp;lt;/math&amp;gt; es la delta de Kronecker. Cuando se combina con la [[Cross-Entropy Loss|perdida de entropia cruzada]], el gradiente se simplifica a &amp;lt;math&amp;gt;\hat{y}_k - y_k&amp;lt;/math&amp;gt;, lo que es computacionalmente eficiente y numericamente estable.&lt;br /&gt;
&lt;br /&gt;
== Uso en clasificacion ==&lt;br /&gt;
&lt;br /&gt;
En un flujo de clasificacion tipico:&lt;br /&gt;
&lt;br /&gt;
# Una red neuronal produce logits crudos &amp;lt;math&amp;gt;\mathbf{z}&amp;lt;/math&amp;gt; a partir de su capa lineal final.&lt;br /&gt;
# Softmax convierte los logits en probabilidades: &amp;lt;math&amp;gt;\hat{\mathbf{y}} = \sigma(\mathbf{z})&amp;lt;/math&amp;gt;.&lt;br /&gt;
# La clase predicha es &amp;lt;math&amp;gt;\hat{c} = \arg\max_k \hat{y}_k&amp;lt;/math&amp;gt;.&lt;br /&gt;
# El entrenamiento utiliza la [[Cross-Entropy Loss|perdida de entropia cruzada]] aplicada a la distribucion predicha y las etiquetas verdaderas.&lt;br /&gt;
&lt;br /&gt;
En la practica, softmax y la entropia cruzada se calculan conjuntamente por estabilidad numerica (la formulacion &amp;#039;&amp;#039;&amp;#039;log-softmax&amp;#039;&amp;#039;&amp;#039;), y el argmax en el momento de la inferencia puede aplicarse directamente a los logits sin calcular softmax en absoluto.&lt;br /&gt;
&lt;br /&gt;
== Mas alla de la clasificacion ==&lt;br /&gt;
&lt;br /&gt;
Softmax aparece en muchos contextos mas alla de la capa de salida:&lt;br /&gt;
&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Mecanismos de atencion&amp;#039;&amp;#039;&amp;#039;: Softmax normaliza las puntuaciones de alineamiento en pesos de atencion en la arquitectura [[Attention Mechanisms|Transformer]].&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Aprendizaje por refuerzo&amp;#039;&amp;#039;&amp;#039;: Softmax sobre las estimaciones de valor de accion produce una politica estocastica (exploracion de Boltzmann).&lt;br /&gt;
* &amp;#039;&amp;#039;&amp;#039;Modelos de mezcla&amp;#039;&amp;#039;&amp;#039;: Softmax parametriza los coeficientes de mezcla en arquitecturas de mezcla de expertos.&lt;br /&gt;
&lt;br /&gt;
== Vease tambien ==&lt;br /&gt;
&lt;br /&gt;
* [[Cross-Entropy Loss]]&lt;br /&gt;
* [[Loss Functions]]&lt;br /&gt;
* [[Logistic regression]]&lt;br /&gt;
* [[Attention Mechanisms]]&lt;br /&gt;
* [[Neural Networks]]&lt;br /&gt;
&lt;br /&gt;
== Referencias ==&lt;br /&gt;
&lt;br /&gt;
* Bishop, C. M. (2006). &amp;#039;&amp;#039;Pattern Recognition and Machine Learning&amp;#039;&amp;#039;. Springer, Section 4.3.4.&lt;br /&gt;
* Goodfellow, I., Bengio, Y. and Courville, A. (2016). &amp;#039;&amp;#039;Deep Learning&amp;#039;&amp;#039;. MIT Press, Section 6.2.2.3.&lt;br /&gt;
* Hinton, G., Vinyals, O. and Dean, J. (2015). &amp;quot;Distilling the Knowledge in a Neural Network&amp;quot;. &amp;#039;&amp;#039;arXiv:1503.02531&amp;#039;&amp;#039;.&lt;br /&gt;
* Bridle, J. S. (1990). &amp;quot;Probabilistic Interpretation of Feedforward Classification Network Outputs&amp;quot;. &amp;#039;&amp;#039;Neurocomputing&amp;#039;&amp;#039;.&lt;br /&gt;
&lt;br /&gt;
[[Category:Machine Learning]]&lt;br /&gt;
[[Category:Introductory]]&lt;/div&gt;</summary>
		<author><name>DeployBot</name></author>
	</entry>
</feed>