Translations:Diffusion Models Are Real-Time Game Engines/18/en: Difference between revisions

Latest revision as of 05:27, 1 September 2024

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Diffusion Models Are Real-Time Game Engines)

Given an input interactive environment <math>\mathcal{E}</math>, and an initial state <math>s_{0} \in \mathcal{S}</math>, an ''Interactive World Simulation'' is a ''simulation distribution function'' <math>q \left( o_{n} \,|\, \{o_{< n}, a_{\leq n}\} \right), \; o_{i} \in \mathcal{O}, \; a_{i} \in \mathcal{A}</math>. Given a distance metric between observations <math>D: \mathcal{O} \times \mathcal{O} \rightarrow \mathbb{R}</math>, a ''policy'', i.e., a distribution on agent actions given past actions and observations <math>\pi \left( a_{n} \,|\, o_{< n}, a_{< n} \right)</math>, a distribution <math>S_{0}</math> on initial states, and a distribution <math>N_{0}</math> on episode lengths, the ''Interactive World Simulation'' objective consists of minimizing <math>E \left( D \left( o_{q}^{i}, o_{p}^{i} \right) \right)</math> where <math>n \sim N_{0}</math>, <math>0 \leq i \leq n</math>, and <math>o_{q}^{i} \sim q, \; o_{p}^{i} \sim V(p)</math> are sampled observations from the environment and the simulation when enacting the agent’s policy <math>\pi</math>. Importantly, the conditioning actions for these samples are always obtained by the agent interacting with the environment <math>\mathcal{E}</math>, while the conditioning observations can either be obtained from <math>\mathcal{E}</math> (the ''teacher forcing objective'') or from the simulation (the ''auto-regressive objective'').

Given an input interactive environment ${\mathcal {E}}$ , and an initial state $s_{0}\in {\mathcal {S}}$ , an Interactive World Simulation is a simulation distribution function $q\left(o_{n}\,|\,\{o_{<n},a_{\leq n}\}\right),\;o_{i}\in {\mathcal {O}},\;a_{i}\in {\mathcal {A}}$ . Given a distance metric between observations $D:{\mathcal {O}}\times {\mathcal {O}}\rightarrow \mathbb {R}$ , a policy, i.e., a distribution on agent actions given past actions and observations $\pi \left(a_{n}\,|\,o_{<n},a_{<n}\right)$ , a distribution $S_{0}$ on initial states, and a distribution $N_{0}$ on episode lengths, the Interactive World Simulation objective consists of minimizing $E\left(D\left(o_{q}^{i},o_{p}^{i}\right)\right)$ where $n\sim N_{0}$ , $0\leq i\leq n$ , and $o_{q}^{i}\sim q,\;o_{p}^{i}\sim V(p)$ are sampled observations from the environment and the simulation when enacting the agent’s policy $\pi$ . Importantly, the conditioning actions for these samples are always obtained by the agent interacting with the environment ${\mathcal {E}}$ , while the conditioning observations can either be obtained from ${\mathcal {E}}$ (the teacher forcing objective) or from the simulation (the auto-regressive objective).