Translations:Diffusion Models Are Real-Time Game Engines/42/en: Difference between revisions

Latest revision as of 03:06, 7 September 2024

Information about message (contribute)

This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.

Message definition (Diffusion Models Are Real-Time Game Engines)

During inference, we need to run both the U-Net denoiser (for a number of steps) and the auto-encoder. On our hardware configuration (a TPU-v5), a single denoiser step and an evaluation of the auto-encoder both takes 10ms. If we ran our model with a single denoiser step, the minimum total latency possible in our setup would be 20ms per frame, or 50 frames per second. Usually, generative diffusion models, such as Stable Diffusion, don’t produce high quality results with a single denoising step, and instead require dozens of sampling steps to generate a high-quality image. Surprisingly, we found that we can robustly simulate DOOM, with only 4 DDIM sampling steps (Song et al., [https://arxiv.org/html/2408.14837v1#bib.bib33 2020]). In fact, we observe no degradation in simulation quality when using 4 sampling steps vs 20 steps or more (see Appendix [https://arxiv.org/html/2408.14837v1#A1.SS4 A.4]).

During inference, we need to run both the U-Net denoiser (for a number of steps) and the auto-encoder. On our hardware configuration (a TPU-v5), a single denoiser step and an evaluation of the auto-encoder both takes 10ms. If we ran our model with a single denoiser step, the minimum total latency possible in our setup would be 20ms per frame, or 50 frames per second. Usually, generative diffusion models, such as Stable Diffusion, don’t produce high quality results with a single denoising step, and instead require dozens of sampling steps to generate a high-quality image. Surprisingly, we found that we can robustly simulate DOOM, with only 4 DDIM sampling steps (Song et al., 2020). In fact, we observe no degradation in simulation quality when using 4 sampling steps vs 20 steps or more (see Appendix A.4).

Revision as of 05:28, 1 September 2024 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)		Latest revision as of 03:06, 7 September 2024 (view source) FuzzyBot (talk \| contribs) (Importing a new version from external source)
Line 1:		Line 1:
	During inference, we need to run both the U-Net denoiser (for a number of steps) and the auto-encoder. On our hardware configuration (a TPU-v5), a single denoiser step and an evaluation of the auto-encoder both takes 10ms. If we ran our model with a single denoiser step, the minimum total latency possible in our setup would be 20ms per frame, or 50 frames per second. Usually, generative diffusion models, such as Stable Diffusion, don’t produce high quality results with a single denoising step, and instead require dozens of sampling steps to generate a high-quality image. Surprisingly, we found that we can robustly simulate DOOM, with only 4 DDIM sampling steps (Song ~~et al~~., [https://arxiv.org/html/2408.14837v1#bib.bib33 2020]). In fact, we observe no degradation in simulation quality when using 4 sampling steps vs 20 steps or more (see Appendix [https://arxiv.org/html/2408.14837v1#A1.SS4 A.4]).		During inference, we need to run both the U-Net denoiser (for a number of steps) and the auto-encoder. On our hardware configuration (a TPU-v5), a single denoiser step and an evaluation of the auto-encoder both takes 10ms. If we ran our model with a single denoiser step, the minimum total latency possible in our setup would be 20ms per frame, or 50 frames per second. Usually, generative diffusion models, such as Stable Diffusion, don’t produce high quality results with a single denoising step, and instead require dozens of sampling steps to generate a high-quality image. Surprisingly, we found that we can robustly simulate DOOM, with only 4 DDIM sampling steps (Song et al., [https://arxiv.org/html/2408.14837v1#bib.bib33 2020]). In fact, we observe no degradation in simulation quality when using 4 sampling steps vs 20 steps or more (see Appendix [https://arxiv.org/html/2408.14837v1#A1.SS4 A.4]).