Translations:Diffusion Models Are Real-Time Game Engines/18/en

Given an input interactive environment ${\mathcal {E}}$ , and an initial state $s_{0}\in {\mathcal {S}}$ , an Interactive World Simulation is a simulation distribution function $q\left(o_{n}\,|\,\{o_{<n},a_{\leq n}\}\right),\;o_{i}\in {\mathcal {O}},\;a_{i}\in {\mathcal {A}}$ . Given a distance metric between observations $D:{\mathcal {O}}\times {\mathcal {O}}\rightarrow \mathbb {R}$ , a policy, i.e., a distribution on agent actions given past actions and observations $\pi \left(a_{n}\,|\,o_{<n},a_{<n}\right)$ , a distribution $S_{0}$ on initial states, and a distribution $N_{0}$ on episode lengths, the Interactive World Simulation objective consists of minimizing $E\left(D\left(o_{q}^{i},o_{p}^{i}\right)\right)$ where $n\sim N_{0}$ , $0\leq i\leq n$ , and $o_{q}^{i}\sim q,\;o_{p}^{i}\sim V(p)$ are sampled observations from the environment and the simulation when enacting the agent’s policy $\pi$ . Importantly, the conditioning actions for these samples are always obtained by the agent interacting with the environment ${\mathcal {E}}$ , while the conditioning observations can either be obtained from ${\mathcal {E}}$ (the teacher forcing objective) or from the simulation (the auto-regressive objective).