Diffusion Models Are Real-Time Game Engines/zh: Revision history

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

9 September 2024

  • curprev 03:1103:11, 9 September 2024Felipefelixarias talk contribs 49,624 bytes +1 No edit summary
  • curprev 03:0703:07, 9 September 2024Felipefelixarias talk contribs 49,623 bytes −2 No edit summary
  • curprev 03:0703:07, 9 September 2024Felipefelixarias talk contribs 49,625 bytes −1 No edit summary
  • curprev 03:0503:05, 9 September 2024Felipefelixarias talk contribs 49,626 bytes −35 No edit summary
  • curprev 00:3600:36, 9 September 2024Felipefelixarias talk contribs 49,661 bytes −952 Created page with "center|thumb|600x600px|图 6:自回归评估。64步自回归过程中的PSNR指标。"
  • curprev 00:3400:34, 9 September 2024Felipefelixarias talk contribs 50,613 bytes −504 Created page with "* Menapace et al. (2024) Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, and Elisa Ricci. Promptable game models: Text-guided game simulation via masked diffusion models. ''ACM Transactions on Graphics'', 43(2):1–16, January 2024. doi: [10.1145/3635705](http://dx.doi.org/10.1145/3635705)."
  • curprev 00:3300:33, 9 September 2024Felipefelixarias talk contribs 51,117 bytes −616 Created page with "如需联系,请发送邮件至 <code>shlomif@google.com</code> 和 <code>leviathan@google.com</code>。"
  • curprev 00:3200:32, 9 September 2024Felipefelixarias talk contribs 51,733 bytes −446 Created page with "我们注意到,类似于 NVidia 的经典 SLI 交替帧渲染(AFR)技术,通过在额外硬件上并行生成多个帧,可以显著提高图像生成速率。然而,与 AFR 类似,实际的仿真速率不会提高,输入延迟也不会减少。"
  • curprev 00:3100:31, 9 September 2024Felipefelixarias talk contribs 52,179 bytes −151 Created page with "我们将代理生成的数据训练与使用随机策略生成的数据训练进行比较。对于随机策略,我们根据与观测结果无关的均匀分类分布对动作进行采样。我们通过对两个模型及其解码器进行"
  • curprev 00:3100:31, 9 September 2024Felipefelixarias talk contribs 52,330 bytes −123 Created page with "== 致谢 =="
  • curprev 00:3000:30, 9 September 2024Felipefelixarias talk contribs 52,453 bytes −1,030 Created page with "=== 5.2 消融实验 ==="
  • curprev 00:3000:30, 9 September 2024Felipefelixarias talk contribs 53,483 bytes −329 Created page with "Stable Diffusion v1.4 的预训练自动编码器将 8x8 像素块压缩为 4 个潜通道,在预测游戏帧时会导致有意义的伪影,影响小细节,尤其是底栏 HUD(“抬头显示”)。为了在提高图像质量的同时利用预训练的知识,我们仅使用针对目标帧像素计算的 MSE 损失来训练潜在自动编码器的解码器。使用 LPIPS(Zhang 等人([https://arxiv.org/html/2408.14837v1#bib.bib40 2018]))等感知损失..."
  • curprev 00:3000:30, 9 September 2024Felipefelixarias talk contribs 53,812 bytes −143 Created page with "仅使用 4 个去噪步骤导致 U-Net 总耗时为 40 毫秒(包括自动编码器的推理总耗时为 50 毫秒),即每秒 20 帧。我们推测,在我们的案例中,较少步骤对质量影响可忽略不计,是由于以下因素的结合:(1) 受限的图像空间,以及 (2) 前一帧的强条件作用。"
  • curprev 00:2900:29, 9 September 2024Felipefelixarias talk contribs 53,955 bytes −459 Created page with "重建三维表示的神经方法在过去几年中取得了重大进展。NeRFs(Mildenhall 等人,[https://arxiv.org/html/2408.14837v1#bib.bib20 2020])使用深度神经网络对辐射场进行参数化,该网络针对从不同相机姿态拍摄的一组图像的特定场景进行了专门优化。训练完成后,可通过体积渲染方法对场景的新视角进行采样。Gaussian Splatting(Kerbl 等人,[https://arxiv.org/html/2408.14837v1#bib.bib15 202..."
  • curprev 00:2900:29, 9 September 2024Felipefelixarias talk contribs 54,414 bytes −60 Created page with "视频质量 我们使用第[https://arxiv.org/html/2408.14837v1#S2 2]节中描述的自回归设置,按照真实轨迹所定义的动作序列对帧进行迭代采样,同时将模型自身的过往预测作为条件。自回归采样时,预测轨迹和真实轨迹常常在几步后发生偏离,这主要是由于不同轨迹的帧间积累了少量不同的运动速度。因此,如图[https://arxiv.org/html/2408.14837v1#S5.F6 6]所示,每帧的PSNR和LPIPS值..."
  • curprev 00:2800:28, 9 September 2024Felipefelixarias talk contribs 54,474 bytes −445 Created page with "为了消除噪声增强的影响,我们训练了一个不添加噪声的模型。我们对标准噪声增强模型和不添加噪声的模型(经过 200,000 步训练后)进行自回归评估,并计算在随机保留的 512 条轨迹上预测帧与真实帧之间的 PSNR 和 LPIPS 指标。我们在图 [https://arxiv.org/html/2408.14837v1#S5.F7 7] 中报告了每个自回归步骤的平均指标值,最多可达 64 帧。"
  • curprev 00:2800:28, 9 September 2024Felipefelixarias talk contribs 54,919 bytes −980 Created page with "我们通过速度参数化训练模型,使得扩散损失最小化(Salimans & Ho, [https://arxiv.org/html/2408.14837v1#bib.bib29 2022b]):"
  • curprev 00:2700:27, 9 September 2024Felipefelixarias talk contribs 55,899 bytes −281 Created page with "我们还尝试了同时生成 4 个样本并合并结果,希望能防止罕见的极端预测被采纳,并减少误差累积。我们尝试了对样本进行平均和选择最接近中位数的样本。平均效果略逊于单帧,而选择最接近中位数的样本效果仅略有提升。由于这两种方法都会将硬件需求提高到 4 个张量处理单元(TPU),因此我们决定不使用这些方法,但注意到这可能是未来研究的一个有..."
  • curprev 00:2700:27, 9 September 2024Felipefelixarias talk contribs 56,180 bytes −55 Created page with "GameNGen 回答了在通往游戏引擎新范式的道路上一个重要的问题,即游戏可以自动生成,就像近年来神经模型生成图像和视频一样。仍然存在关键问题,例如如何训练这些神经游戏引擎,以及如何有效地创建游戏,包括如何最佳地利用人类输入。尽管如此,我们对这种新范式的可能性感到非常兴奋。"
  • curprev 00:2700:27, 9 September 2024Felipefelixarias talk contribs 56,235 bytes −156 Created page with "center|thumb|600x600px|图 6:自回归评估。64 个自回归步骤的 LPIPS 指标"
  • curprev 00:2600:26, 9 September 2024Felipefelixarias talk contribs 56,391 bytes −118 Created page with "'''图像质量。''' 我们使用第[https://arxiv.org/html/2408.14837v1#S2 2]节中描述的教师强迫设置来测量LPIPS(Zhang 等人,[https://arxiv.org/html/2408.14837v1#bib.bib40 2018])和PSNR。在该设置中,我们对初始状态进行采样,并根据地面实况的过去观察轨迹预测单帧。在对5个不同级别的2048条随机轨迹进行评估时,我们的模型实现了<math>29.43</math>的PSNR值和<math>0.249</math>的LPIPS值。PSNR..."
  • curprev 00:2600:26, 9 September 2024Felipefelixarias talk contribs 56,509 bytes −262 Created page with "我们使用 Stable Diffusion 1.4 的预训练检查点训练所有仿真模型,解冻所有 U-Net 参数。我们使用的批量大小为 128,恒定学习率为 2e-5,采用无权重衰减的 Adafactor 优化器(Shazeer & Stern,[https://arxiv.org/html/2408.14837v1#bib.bib31 2018]),以及梯度剪切为 1.0。我们将扩散损失参数化更改为 v预测(Salimans & Ho [https://arxiv.org/html/2408.14837v1#bib.bib28 2022a])。我们以 0.1 的概率去..."
  • curprev 00:2600:26, 9 September 2024Felipefelixarias talk contribs 56,771 bytes −295 Created page with "例如,在游戏 DOOM 中,<math>\mathcal{S}</math> 是程序的动态内存内容,<math>\mathcal{O}</math> 是渲染的屏幕像素,<math>V</math> 是游戏的渲染逻辑,<math>\mathcal{A}</math> 是按键和鼠标移动的集合,而 <math>p</math> 是基于玩家输入的程序逻辑(包括任何潜在的非确定性)。"
  • curprev 00:2600:26, 9 September 2024Felipefelixarias talk contribs 57,066 bytes −76 Created page with "center|thumb|900x900px|图 5:模型预测与地面实况对比。仅显示过去观测上下文的最后 4 帧。"
  • curprev 00:2600:26, 9 September 2024Felipefelixarias talk contribs 57,142 bytes −242 Created page with "=== 5.1 仿真质量 ==="
  • curprev 00:2500:25, 9 September 2024Felipefelixarias talk contribs 57,384 bytes −219 Created page with "==== 3.3.2 去噪器采样步骤 ===="
  • curprev 00:2500:25, 9 September 2024Felipefelixarias talk contribs 57,603 bytes −210 Created page with "一个''交互环境''<math>\mathcal{E}</math>由一个潜在状态空间<math>\mathcal{S}</math>、一个潜在空间的部分投影空间<math>\mathcal{O}</math>、一个部分投影函数<math>V: \mathcal{S} \rightarrow \mathcal{O}</math>、一组动作<math>\mathcal{A}</math>,以及一个转移概率函数<math>p \left( s^{\prime} \,|\, a, s \right)</math>,使得<math>s, s^{\prime} \in \mathcal{S}, a\in \mathcal{A}</math>。"
  • curprev 00:2500:25, 9 September 2024Felipefelixarias talk contribs 57,813 bytes −58 Created page with "=== 4.1 代理训练 ==="
  • curprev 00:2500:25, 9 September 2024Felipefelixarias talk contribs 57,871 bytes −62 Created page with "我们使用DDIM采样(Song等人,[https://arxiv.org/html/2408.14837v1#bib.bib34 2022])。我们仅对过去观测条件<math>o_{< n}</math>采用了无分类器指导(Ho & Salimans,[https://arxiv.org/html/2408.14837v1#bib.bib12 2022])。我们发现对过去动作条件<math>a_{< n}</math>的指导无法提高质量。我们使用的权重相对较小(1.5),因为较大的权重会产生伪影,而我们的自动回归采样则会放大这些伪影。"
  • curprev 00:2400:24, 9 September 2024Felipefelixarias talk contribs 57,933 bytes −213 Created page with "在推理过程中,我们需要运行 U-Net 去噪器(进行若干步)和自动编码器。在我们的硬件配置(TPU-v5)下,一次去噪步骤和自动编码器的评估各需 10 毫秒。如果我们以单步去噪器运行模型,设置中的最小总延迟为每帧 20 毫秒,即每秒 50 帧。通常情况下,生成扩散模型(如 Stable Diffusion)通过单次去噪步骤无法产生高质量结果,而是需要数十个采样步骤才能生..."
  • curprev 00:2400:24, 9 September 2024Felipefelixarias talk contribs 58,146 bytes −73 Created page with "==== 3.2.1 使用噪声增强缓解自回归漂移 ===="
  • curprev 00:2400:24, 9 September 2024Felipefelixarias talk contribs 58,219 bytes −385 Created page with "一个实时运行的神经模型是否能够以高质量模拟复杂的游戏?"
  • curprev 00:2300:23, 9 September 2024Felipefelixarias talk contribs 58,604 bytes −465 Created page with "给定输入交互环境 <math>\mathcal{E}</math> 和初始状态 <math>s_{0} \in \mathcal{S}</math>,一个“交互世界模拟”是一个“模拟分布函数” <math>q \left( o_{n} \,|\, \{o_{< n}, a_{\leq n}\} \right), \; o_{i} \in \mathcal{O}, \; a_{i} \in \mathcal{A}</math>。给定观测值之间的距离度量 <math>D: \mathcal{O} \times \mathcal{O} \rightarrow \mathbb{R}</math>,一个“策略”,即给定过去动作和观测的代理动作分布 <math>..."
  • curprev 00:2200:22, 9 September 2024Felipefelixarias talk contribs 59,069 bytes −244 Created page with "center|thumb|900x900px|图3:GameNGen方法概览。为了简洁起见,省略了v预测的详细信息。"
  • curprev 00:2100:21, 9 September 2024Felipefelixarias talk contribs 59,313 bytes −262 Created page with "我们在整个训练过程中记录了代理的训练轨迹,其中涵盖了不同技能水平的游戏。这组记录的轨迹构成了我们的<math>\mathcal{T}_{agent}</math>数据集,用于训练生成模型(见第[https://arxiv.org/html/2408.14837v1#S3.SS2 3.2]节)。"
  • curprev 00:2100:21, 9 September 2024Felipefelixarias talk contribs 59,575 bytes −392 Created page with "现在,我们训练一个生成扩散模型,该模型以在前一阶段收集的代理轨迹<math>\mathcal{T}_{agent}</math>(行动和观察)作为条件。"
  • curprev 00:2000:20, 9 September 2024Felipefelixarias talk contribs 59,967 bytes −135 Created page with "GameNGen(发音为“游戏引擎”)是一个生成扩散模型,它能够在第[https://arxiv.org/html/2408.14837v1#S2 2]节的设置下学习模拟游戏。为了收集该模型的训练数据,我们首先使用教师强制目标训练一个独立的模型与环境进行交互。这两个模型(代理和生成模型)依次进行训练。在训练过程中,代理的全部行为和观察语料 <math>\mathcal{T}_{agent}</math> 被保留下来,并在第二..."
  • curprev 00:2000:20, 9 September 2024Felipefelixarias talk contribs 60,102 bytes −56 Created page with "== 2 互动世界仿真 =="
  • curprev 00:2000:20, 9 September 2024Felipefelixarias talk contribs 60,158 bytes −574 Created page with "我们总是使用教师强迫目标来训练我们的生成模型。给定一个模拟分布函数 <math>q</math>,可以通过自回归地采样观测值来模拟环境 <math>\mathcal{E}</math>。"
  • curprev 00:1900:19, 9 September 2024Felipefelixarias talk contribs 60,732 bytes −66 Created page with "center|thumb|800x800px|图 2:GameNGen 与之前最先进的 DOOM 仿真的比较"
  • curprev 00:1900:19, 9 September 2024Felipefelixarias talk contribs 60,798 bytes −229 Created page with "== 1 介绍 =="
  • curprev 00:1900:19, 9 September 2024Felipefelixarias talk contribs 61,027 bytes −247 Created page with "在这项工作中,我们证明答案是肯定的。具体来说,我们展示了一款复杂的视频游戏——标志性游戏《DOOM》,可以在神经网络(开放式 Stable Diffusion v1.4 的增强版(Rombach 等人,[https://arxiv.org/html/2408.14837v1#bib.bib26 2022]))上实时运行,同时获得与原始游戏相当的视觉质量。尽管这不是精确仿真,该神经模型能够执行复杂的游戏状态更新,例如统计生命值和弹..."
  • curprev 00:1800:18, 9 September 2024Felipefelixarias talk contribs 61,274 bytes −213 Created page with "近年来,生成模型在根据文本或图像等多模态输入生成图像和视频方面取得了重大进展。在这一浪潮的前沿,扩散模型成为非语言媒体生成的事实标准,如 Dall-E(Ramesh 等人,[https://arxiv.org/html/2408.14837v1#bib.bib25 2022])、Stable Diffusion(Rombach 等人,[https://arxiv.org/html/2408.14837v1#bib.bib26 2022])和 Sora(Brooks 等人,[https://arxiv.org/html/2408.14837v1#bib.bib6 2024])。乍一看,..."
  • curprev 00:1800:18, 9 September 2024Felipefelixarias talk contribs 61,487 bytes −259 Created page with "计算机游戏是围绕以下“游戏循环”手动制作的软件系统:(1) 收集用户输入,(2) 更新游戏状态,(3) 将其渲染为屏幕像素。这个游戏循环以很高的帧率运行,为玩家营造出一个交互式虚拟世界的假象。这种游戏循环通常在标准计算机上运行,尽管也有许多在定制硬件上运行游戏的惊人尝试(例如,标志性游戏《毁灭战士》曾在烤面包机、微波炉、跑步机、照..."
  • curprev 00:1800:18, 9 September 2024Felipefelixarias talk contribs 61,746 bytes −200 Created page with "我们介绍了''GameNGen'',这是第一个完全由神经模型驱动的游戏引擎,能够在长轨迹上与复杂环境进行高质量的实时交互。GameNGen 可以在单个 TPU 上以每秒超过 20 帧的速度交互模拟经典游戏 DOOM。下一帧预测的 PSNR 为 29.4,与有损 JPEG 压缩相当。在区分游戏短片和模拟片段方面,人类评分员的表现仅略好于随机概率。GameNGen 的训练分为两个阶段:(1) 一个强化学..."
  • curprev 00:1800:18, 9 September 2024Felipefelixarias talk contribs 61,946 bytes −58 Created page with "====== 摘要 ======"
  • curprev 00:0400:04, 9 September 2024Felipefelixarias talk contribs 62,004 bytes +62,004 Created page with "'''项目网站:''' [https://gamengen.github.io/ https://gamengen.github.io]"