All translations

Enter a message name below to show all available translations.

Message

Found 3 translations.

NameCurrent message text
 h English (en)Diffusion models achieved state-of-the-art results in text-to-image generation (Saharia et al., [https://arxiv.org/html/2408.14837v1#bib.bib27 2022]; Rombach et al., [https://arxiv.org/html/2408.14837v1#bib.bib26 2022]; Ramesh et al., [https://arxiv.org/html/2408.14837v1#bib.bib25 2022]; Podell et al., [https://arxiv.org/html/2408.14837v1#bib.bib23 2023]), a line of work that has also been applied for text-to-video generation tasks (Ho et al., [https://arxiv.org/html/2408.14837v1#bib.bib14 2022]; Blattmann et al., [https://arxiv.org/html/2408.14837v1#bib.bib5 2023b]; [https://arxiv.org/html/2408.14837v1#bib.bib4 a]; Gupta et al., [https://arxiv.org/html/2408.14837v1#bib.bib9 2023]; Girdhar et al., [https://arxiv.org/html/2408.14837v1#bib.bib8 2023]; Bar-Tal et al., [https://arxiv.org/html/2408.14837v1#bib.bib3 2024]). Despite impressive advancements in realism, text adherence, and temporal consistency, video diffusion models remain too slow for real-time applications. Our work extends this line of work and adapts it for real-time generation conditioned autoregressively on a history of past observations and actions.
 h Spanish (es)Los modelos de difusión lograron resultados de estado del arte en la generación de texto a imagen (Saharia et al., [https://arxiv.org/html/2408.14837v1#bib.bib27 2022]; Rombach et al., [https://arxiv.org/html/2408.14837v1#bib.bib26 2022]; Ramesh et al., [https://arxiv.org/html/2408.14837v1#bib.bib25 2022]; Podell et al., [https://arxiv.org/html/2408.14837v1#bib.bib23 2023]), una línea de trabajo que también se ha aplicado a tareas de generación de texto a video (Ho et al., [https://arxiv.org/html/2408.14837v1#bib.bib14 2022]; Blattmann et al., [https://arxiv.org/html/2408.14837v1#bib.bib5 2023b]; [https://arxiv.org/html/2408.14837v1#bib.bib4 a]; Gupta et al., [https://arxiv.org/html/2408.14837v1#bib.bib9 2023]; Girdhar et al., [https://arxiv.org/html/2408.14837v1#bib.bib8 2023]; Bar-Tal et al., [https://arxiv.org/html/2408.14837v1#bib.bib3 2024]). A pesar de los impresionantes avances en realismo, adherencia al texto y coherencia temporal, los modelos de difusión de video siguen siendo demasiado lentos para las aplicaciones en tiempo real. Nuestro trabajo amplía esta línea de trabajo y la adapta para la generación en tiempo real condicionada de forma autorregresiva en un historial de observaciones y acciones pasadas.
 h Chinese (zh)扩散模型在文本到图像生成中取得了最先进的成果(Saharia 等人,[https://arxiv.org/html/2408.14837v1#bib.bib27 2022];Rombach 等人,[https://arxiv.org/html/2408.14837v1#bib.bib26 2022];Ramesh 等人,[https://arxiv.org/html/2408.14837v1#bib.bib25 2022];Podell 等人,[https://arxiv.org/html/2408.14837v1#bib.bib23 2023]),这一研究领域也被应用于文本到视频生成任务(Ho 等人,[https://arxiv.org/html/2408.14837v1#bib.bib14 2022];Blattmann 等人,[https://arxiv.org/html/2408.14837v1#bib.bib5 2023b];[https://arxiv.org/html/2408.14837v1#bib.bib4 a];Gupta 等人,[https://arxiv.org/html/2408.14837v1#bib.bib9 2023];Girdhar 等人,[https://arxiv.org/html/2408.14837v1#bib.bib8 2023];Bar-Tal 等人,[https://arxiv.org/html/2408.14837v1#bib.bib3 2024])。尽管在逼真性、文本依从性和时间一致性方面取得了显著进展,但视频扩散模型对于实时应用来说仍然过于缓慢。我们的工作扩展了这一研究,并使其适用于基于过去观察和动作历史的自回归条件下的实时生成。