Translations:Diffusion Models Are Real-Time Game Engines/18/es: Difference between revisions

Latest revision as of 03:15, 7 September 2024

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Diffusion Models Are Real-Time Game Engines)

Given an input interactive environment <math>\mathcal{E}</math>, and an initial state <math>s_{0} \in \mathcal{S}</math>, an ''Interactive World Simulation'' is a ''simulation distribution function'' <math>q \left( o_{n} \,|\, \{o_{< n}, a_{\leq n}\} \right), \; o_{i} \in \mathcal{O}, \; a_{i} \in \mathcal{A}</math>. Given a distance metric between observations <math>D: \mathcal{O} \times \mathcal{O} \rightarrow \mathbb{R}</math>, a ''policy'', i.e., a distribution on agent actions given past actions and observations <math>\pi \left( a_{n} \,|\, o_{< n}, a_{< n} \right)</math>, a distribution <math>S_{0}</math> on initial states, and a distribution <math>N_{0}</math> on episode lengths, the ''Interactive World Simulation'' objective consists of minimizing <math>E \left( D \left( o_{q}^{i}, o_{p}^{i} \right) \right)</math> where <math>n \sim N_{0}</math>, <math>0 \leq i \leq n</math>, and <math>o_{q}^{i} \sim q, \; o_{p}^{i} \sim V(p)</math> are sampled observations from the environment and the simulation when enacting the agent’s policy <math>\pi</math>. Importantly, the conditioning actions for these samples are always obtained by the agent interacting with the environment <math>\mathcal{E}</math>, while the conditioning observations can either be obtained from <math>\mathcal{E}</math> (the ''teacher forcing objective'') or from the simulation (the ''auto-regressive objective'').

Dado un entorno interactivo de entrada ${\mathcal {E}}$ , y un estado inicial $s_{0}\in {\mathcal {S}}$ , una simulación de mundo interactivo es una función de distribución de simulación $q\left(o_{n}\,|\,\{o_{<n},a_{\leq n}\}\right),\;o_{i}\in {\mathcal {O}},\;a_{i}\in {\mathcal {A}}$ . Dada una métrica de distancia entre observaciones $D:{\mathcal {O}}\times {\mathcal {O}}\rightarrow \mathbb {R}$ , una política, es decir, una distribución sobre las acciones del agente dadas las acciones pasadas y las observaciones $\pi \left(a_{n}\,|\,o_{<n},a_{<n}\right)$ , una distribución $S_{0}$ sobre los estados iniciales, y una distribución $N_{0}$ sobre la duración de los episodios, el objetivo de la simulación de mundo interactivo consiste en minimizar $E\left(D\left(o_{q}^{i},o_{p}^{i}\right)\right)$ donde $n\sim N_{0}$ , $0\leq i\leq n$ , y $o_{q}^{i}\sim q,\;o_{p}^{i}\sim V(p)$ son observaciones muestreadas del entorno y de la simulación al aplicar la política del agente $\pi$ . Es importante destacar que las acciones de condicionamiento para estas muestras siempre se obtienen mediante la interacción del agente con el entorno ${\mathcal {E}}$ , mientras que las observaciones de condicionamiento pueden obtenerse de ${\mathcal {E}}$ (el objetivo de forzamiento por el maestro) o de la simulación (el objetivo autorregresivo).