其中 T = { o i ≤ n , a i ≤ n } ∼ T a g e n t {\displaystyle T={\{o_{i\leq n},a_{i\leq n}\}}\sim {\mathcal {T}}_{agent}} , x 0 = ϕ ( o n ) {\displaystyle x_{0}=\phi {(o_{n})}} , t ∼ U ( 0 , 1 ) {\displaystyle t\sim {\mathcal {U}}{(0,1)}} , ϵ ∼ N ( 0 , I ) {\displaystyle \epsilon \sim {\mathcal {N}}{(0,\mathbf {I} )}} , x t = α ¯ t x 0 + 1 − α ¯ t ϵ {\displaystyle x_{t}={{\sqrt {{\overline {\alpha }}_{t}}}x_{0}+{\sqrt {1-{\overline {\alpha }}_{t}}}\epsilon }} , v ( ϵ , x 0 , t ) = α ¯ t ϵ − 1 − α ¯ t x 0 {\displaystyle v{(\epsilon ,x_{0},t)}={{\sqrt {{\overline {\alpha }}_{t}}}\epsilon -{\sqrt {1-{\overline {\alpha }}_{t}}}x_{0}}} ,而 v θ ′ {\displaystyle v_{\theta ^{\prime }}} 是模型 f θ {\displaystyle f_{\theta }} 的 v预测输出。噪声调度 α ¯ t {\displaystyle {\overline {\alpha }}_{t}} 是线性的,与 Rombach 等(2022)类似。