Translations:Diffusion Models Are Real-Time Game Engines/83/zh: Difference between revisions

    From Marovi AI
    (Created page with "扩散模型在文本到图像生成中取得了最先进的成果(Saharia 等人,[https://arxiv.org/html/2408.14837v1#bib.bib27 2022];Rombach 等人,[https://arxiv.org/html/2408.14837v1#bib.bib26 2022];Ramesh 等人,[https://arxiv.org/html/2408.14837v1#bib.bib25 2022];Podell 等人,[https://arxiv.org/html/2408.14837v1#bib.bib23 2023]),这一研究领域也被应用于文本到视频生成任务(Ho 等人,[https://arxiv.org/html/2408.14837v1#bib.b...")
     
    (No difference)

    Latest revision as of 00:29, 9 September 2024

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Diffusion Models Are Real-Time Game Engines)
    Diffusion models achieved state-of-the-art results in text-to-image generation (Saharia et al., [https://arxiv.org/html/2408.14837v1#bib.bib27 2022]; Rombach et al., [https://arxiv.org/html/2408.14837v1#bib.bib26 2022]; Ramesh et al., [https://arxiv.org/html/2408.14837v1#bib.bib25 2022]; Podell et al., [https://arxiv.org/html/2408.14837v1#bib.bib23 2023]), a line of work that has also been applied for text-to-video generation tasks (Ho et al., [https://arxiv.org/html/2408.14837v1#bib.bib14 2022]; Blattmann et al., [https://arxiv.org/html/2408.14837v1#bib.bib5 2023b]; [https://arxiv.org/html/2408.14837v1#bib.bib4 a]; Gupta et al., [https://arxiv.org/html/2408.14837v1#bib.bib9 2023]; Girdhar et al., [https://arxiv.org/html/2408.14837v1#bib.bib8 2023]; Bar-Tal et al., [https://arxiv.org/html/2408.14837v1#bib.bib3 2024]). Despite impressive advancements in realism, text adherence, and temporal consistency, video diffusion models remain too slow for real-time applications. Our work extends this line of work and adapts it for real-time generation conditioned autoregressively on a history of past observations and actions.

    扩散模型在文本到图像生成中取得了最先进的成果(Saharia 等人,2022;Rombach 等人,2022;Ramesh 等人,2022;Podell 等人,2023),这一研究领域也被应用于文本到视频生成任务(Ho 等人,2022;Blattmann 等人,2023ba;Gupta 等人,2023;Girdhar 等人,2023;Bar-Tal 等人,2024)。尽管在逼真性、文本依从性和时间一致性方面取得了显著进展,但视频扩散模型对于实时应用来说仍然过于缓慢。我们的工作扩展了这一研究,并使其适用于基于过去观察和动作历史的自回归条件下的实时生成。