Translations:Diffusion Models Are Real-Time Game Engines/33/zh: Difference between revisions

Latest revision as of 00:21, 9 September 2024

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Diffusion Models Are Real-Time Game Engines)

The domain shift between training with teacher-forcing and auto-regressive sampling leads to error accumulation and fast degradation in sample quality, as demonstrated in Figure [https://arxiv.org/html/2408.14837v1#S3.F4 4]. To avoid this divergence due to auto-regressive application of the model, we corrupt context frames by adding a varying amount of Gaussian noise to encoded frames in training time, while providing the noise level as input to the model, following Ho et al. ([https://arxiv.org/html/2408.14837v1#bib.bib13 2021]). To that effect, we sample a noise level <math>\alpha</math> uniformly up to a maximal value, discretize it and learn an embedding for each bucket (see Figure [https://arxiv.org/html/2408.14837v1#S3.F3 3]). This allows the network to correct information sampled in previous frames, and is critical for preserving frame quality over time. During inference, the added noise level can be controlled to maximize quality, although we find that even with no added noise the results are significantly improved. We ablate the impact of this method in section [https://arxiv.org/html/2408.14837v1#S5.SS2.SSS2 5.2.2].

如图4所示，教师强制训练和自动回归采样之间的领域偏移会导致误差积累和采样质量的快速下降。为了避免由于模型的自动回归应用而导致的这种偏差，我们在训练时向编码帧中添加不同程度的高斯噪声来扰动背景帧，并将噪声水平作为输入提供给模型，仿效 Ho 等人（2021）的方法。为此，我们对噪声水平 $\alpha$ 进行均匀采样，直至最大值，然后对其进行离散化，并为每个区间学习一个嵌入（见图3）。这使得网络能够纠正前几帧中的采样信息，对于长期保持帧质量至关重要。在推理过程中，可以控制添加的噪声水平以最大化质量，尽管我们发现，即使不添加噪声，结果也显著改善。我们将在5.2.2部分分析这种方法的影响。