We now train a generative diffusion model conditioned on the agent’s trajectories T a g e n t {\displaystyle {\mathcal {T}}_{agent}} (actions and observations) collected during the previous stage.