Translations:Diffusion Models Are Real-Time Game Engines/83/en: Difference between revisions

    From Marovi AI
    (Importing a new version from external source)
     
    (Importing a new version from external source)
     
    Line 1: Line 1:
    Diffusion models achieved state-of-the-art results in text-to-image generation (Saharia et al., [https://arxiv.org/html/2408.14837v1#bib.bib27 2022]; Rombach et al., [https://arxiv.org/html/2408.14837v1#bib.bib26 2022]; Ramesh et al., [https://arxiv.org/html/2408.14837v1#bib.bib25 2022]; Podell et al., [https://arxiv.org/html/2408.14837v1#bib.bib23 2023]), a line of work that has also been applied for text-to-video generation tasks (Ho et al., [https://arxiv.org/html/2408.14837v1#bib.bib14 2022]; Blattmann et al., [https://arxiv.org/html/2408.14837v1#bib.bib5 2023b]; [https://arxiv.org/html/2408.14837v1#bib.bib4 a]; Gupta et al., [https://arxiv.org/html/2408.14837v1#bib.bib9 2023]; Girdhar et al., [https://arxiv.org/html/2408.14837v1#bib.bib8 2023]; Bar-Tal et al., [https://arxiv.org/html/2408.14837v1#bib.bib3 2024]). Despite impressive advancements in realism, text adherence, and temporal consistency, video diffusion models remain too slow for real-time applications. Our work extends this line of work and adapts it for real-time generation conditioned autoregressively on a history of past observations and actions.
    Diffusion models achieved state-of-the-art results in text-to-image generation (Saharia et al., [https://arxiv.org/html/2408.14837v1#bib.bib27 2022]; Rombach et al., [https://arxiv.org/html/2408.14837v1#bib.bib26 2022]; Ramesh et al., [https://arxiv.org/html/2408.14837v1#bib.bib25 2022]; Podell et al., [https://arxiv.org/html/2408.14837v1#bib.bib23 2023]), a line of work that has also been applied for text-to-video generation tasks (Ho et al., [https://arxiv.org/html/2408.14837v1#bib.bib14 2022]; Blattmann et al., [https://arxiv.org/html/2408.14837v1#bib.bib5 2023b]; [https://arxiv.org/html/2408.14837v1#bib.bib4 a]; Gupta et al., [https://arxiv.org/html/2408.14837v1#bib.bib9 2023]; Girdhar et al., [https://arxiv.org/html/2408.14837v1#bib.bib8 2023]; Bar-Tal et al., [https://arxiv.org/html/2408.14837v1#bib.bib3 2024]). Despite impressive advancements in realism, text adherence, and temporal consistency, video diffusion models remain too slow for real-time applications. Our work extends this line of work and adapts it for real-time generation conditioned autoregressively on a history of past observations and actions.

    Latest revision as of 03:06, 7 September 2024

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Diffusion Models Are Real-Time Game Engines)
    Diffusion models achieved state-of-the-art results in text-to-image generation (Saharia et al., [https://arxiv.org/html/2408.14837v1#bib.bib27 2022]; Rombach et al., [https://arxiv.org/html/2408.14837v1#bib.bib26 2022]; Ramesh et al., [https://arxiv.org/html/2408.14837v1#bib.bib25 2022]; Podell et al., [https://arxiv.org/html/2408.14837v1#bib.bib23 2023]), a line of work that has also been applied for text-to-video generation tasks (Ho et al., [https://arxiv.org/html/2408.14837v1#bib.bib14 2022]; Blattmann et al., [https://arxiv.org/html/2408.14837v1#bib.bib5 2023b]; [https://arxiv.org/html/2408.14837v1#bib.bib4 a]; Gupta et al., [https://arxiv.org/html/2408.14837v1#bib.bib9 2023]; Girdhar et al., [https://arxiv.org/html/2408.14837v1#bib.bib8 2023]; Bar-Tal et al., [https://arxiv.org/html/2408.14837v1#bib.bib3 2024]). Despite impressive advancements in realism, text adherence, and temporal consistency, video diffusion models remain too slow for real-time applications. Our work extends this line of work and adapts it for real-time generation conditioned autoregressively on a history of past observations and actions.

    Diffusion models achieved state-of-the-art results in text-to-image generation (Saharia et al., 2022; Rombach et al., 2022; Ramesh et al., 2022; Podell et al., 2023), a line of work that has also been applied for text-to-video generation tasks (Ho et al., 2022; Blattmann et al., 2023b; a; Gupta et al., 2023; Girdhar et al., 2023; Bar-Tal et al., 2024). Despite impressive advancements in realism, text adherence, and temporal consistency, video diffusion models remain too slow for real-time applications. Our work extends this line of work and adapts it for real-time generation conditioned autoregressively on a history of past observations and actions.