Diffusion Models Are Real-Time Game Engines/zh: Revision history

Diff selection: Mark the radio buttons of the revisions to compare and hit enter or the button at the bottom.
Legend: (cur) = difference with latest revision, (prev) = difference with preceding revision, m = minor edit.

curprev 03:1103:11, 9 September 2024‎ Felipefelixarias talk contribs‎ 49,624 bytes +1‎ No edit summary
curprev 03:0703:07, 9 September 2024‎ Felipefelixarias talk contribs‎ 49,623 bytes −2‎ No edit summary
curprev 03:0703:07, 9 September 2024‎ Felipefelixarias talk contribs‎ 49,625 bytes −1‎ No edit summary
curprev 03:0503:05, 9 September 2024‎ Felipefelixarias talk contribs‎ 49,626 bytes −35‎ No edit summary
curprev 00:3600:36, 9 September 2024‎ Felipefelixarias talk contribs‎ 49,661 bytes −952‎ Created page with "center|thumb|600x600px|图 6：自回归评估。64步自回归过程中的PSNR指标。"
curprev 00:3400:34, 9 September 2024‎ Felipefelixarias talk contribs‎ 50,613 bytes −504‎ Created page with "* Menapace et al. (2024) Willi Menapace, Aliaksandr Siarohin, Stéphane Lathuilière, Panos Achlioptas, Vladislav Golyanik, Sergey Tulyakov, and Elisa Ricci. Promptable game models: Text-guided game simulation via masked diffusion models. ''ACM Transactions on Graphics'', 43(2):1–16, January 2024. doi: [10.1145/3635705](http://dx.doi.org/10.1145/3635705)."
curprev 00:3300:33, 9 September 2024‎ Felipefelixarias talk contribs‎ 51,117 bytes −616‎ Created page with "如需联系，请发送邮件至 <code>shlomif@google.com</code> 和 <code>leviathan@google.com</code>。"
curprev 00:3200:32, 9 September 2024‎ Felipefelixarias talk contribs‎ 51,733 bytes −446‎ Created page with "我们注意到，类似于 NVidia 的经典 SLI 交替帧渲染（AFR）技术，通过在额外硬件上并行生成多个帧，可以显著提高图像生成速率。然而，与 AFR 类似，实际的仿真速率不会提高，输入延迟也不会减少。"
curprev 00:3100:31, 9 September 2024‎ Felipefelixarias talk contribs‎ 52,179 bytes −151‎ Created page with "我们将代理生成的数据训练与使用随机策略生成的数据训练进行比较。对于随机策略，我们根据与观测结果无关的均匀分类分布对动作进行采样。我们通过对两个模型及其解码器进行"
curprev 00:3100:31, 9 September 2024‎ Felipefelixarias talk contribs‎ 52,330 bytes −123‎ Created page with "== 致谢 =="
curprev 00:3000:30, 9 September 2024‎ Felipefelixarias talk contribs‎ 52,453 bytes −1,030‎ Created page with "=== 5.2 消融实验 ==="
curprev 00:3000:30, 9 September 2024‎ Felipefelixarias talk contribs‎ 53,483 bytes −329‎ Created page with "Stable Diffusion v1.4 的预训练自动编码器将 8x8 像素块压缩为 4 个潜通道，在预测游戏帧时会导致有意义的伪影，影响小细节，尤其是底栏 HUD（“抬头显示”）。为了在提高图像质量的同时利用预训练的知识，我们仅使用针对目标帧像素计算的 MSE 损失来训练潜在自动编码器的解码器。使用 LPIPS（Zhang 等人（[https://arxiv.org/html/2408.14837v1#bib.bib40 2018]））等感知损失..."
curprev 00:3000:30, 9 September 2024‎ Felipefelixarias talk contribs‎ 53,812 bytes −143‎ Created page with "仅使用 4 个去噪步骤导致 U-Net 总耗时为 40 毫秒（包括自动编码器的推理总耗时为 50 毫秒），即每秒 20 帧。我们推测，在我们的案例中，较少步骤对质量影响可忽略不计，是由于以下因素的结合：(1) 受限的图像空间，以及 (2) 前一帧的强条件作用。"
curprev 00:2900:29, 9 September 2024‎ Felipefelixarias talk contribs‎ 53,955 bytes −459‎ Created page with "重建三维表示的神经方法在过去几年中取得了重大进展。NeRFs（Mildenhall 等人，[https://arxiv.org/html/2408.14837v1#bib.bib20 2020]）使用深度神经网络对辐射场进行参数化，该网络针对从不同相机姿态拍摄的一组图像的特定场景进行了专门优化。训练完成后，可通过体积渲染方法对场景的新视角进行采样。Gaussian Splatting（Kerbl 等人，[https://arxiv.org/html/2408.14837v1#bib.bib15 202..."
curprev 00:2900:29, 9 September 2024‎ Felipefelixarias talk contribs‎ 54,414 bytes −60‎ Created page with "视频质量我们使用第[https://arxiv.org/html/2408.14837v1#S2 2]节中描述的自回归设置，按照真实轨迹所定义的动作序列对帧进行迭代采样，同时将模型自身的过往预测作为条件。自回归采样时，预测轨迹和真实轨迹常常在几步后发生偏离，这主要是由于不同轨迹的帧间积累了少量不同的运动速度。因此，如图[https://arxiv.org/html/2408.14837v1#S5.F6 6]所示，每帧的PSNR和LPIPS值..."
curprev 00:2800:28, 9 September 2024‎ Felipefelixarias talk contribs‎ 54,474 bytes −445‎ Created page with "为了消除噪声增强的影响，我们训练了一个不添加噪声的模型。我们对标准噪声增强模型和不添加噪声的模型（经过 200,000 步训练后）进行自回归评估，并计算在随机保留的 512 条轨迹上预测帧与真实帧之间的 PSNR 和 LPIPS 指标。我们在图 [https://arxiv.org/html/2408.14837v1#S5.F7 7] 中报告了每个自回归步骤的平均指标值，最多可达 64 帧。"
curprev 00:2800:28, 9 September 2024‎ Felipefelixarias talk contribs‎ 54,919 bytes −980‎ Created page with "我们通过速度参数化训练模型，使得扩散损失最小化（Salimans & Ho, [https://arxiv.org/html/2408.14837v1#bib.bib29 2022b]）："
curprev 00:2700:27, 9 September 2024‎ Felipefelixarias talk contribs‎ 55,899 bytes −281‎ Created page with "我们还尝试了同时生成 4 个样本并合并结果，希望能防止罕见的极端预测被采纳，并减少误差累积。我们尝试了对样本进行平均和选择最接近中位数的样本。平均效果略逊于单帧，而选择最接近中位数的样本效果仅略有提升。由于这两种方法都会将硬件需求提高到 4 个张量处理单元（TPU），因此我们决定不使用这些方法，但注意到这可能是未来研究的一个有..."
curprev 00:2700:27, 9 September 2024‎ Felipefelixarias talk contribs‎ 56,180 bytes −55‎ Created page with "GameNGen 回答了在通往游戏引擎新范式的道路上一个重要的问题，即游戏可以自动生成，就像近年来神经模型生成图像和视频一样。仍然存在关键问题，例如如何训练这些神经游戏引擎，以及如何有效地创建游戏，包括如何最佳地利用人类输入。尽管如此，我们对这种新范式的可能性感到非常兴奋。"
curprev 00:2700:27, 9 September 2024‎ Felipefelixarias talk contribs‎ 56,235 bytes −156‎ Created page with "center|thumb|600x600px|图 6：自回归评估。64 个自回归步骤的 LPIPS 指标"
curprev 00:2600:26, 9 September 2024‎ Felipefelixarias talk contribs‎ 56,391 bytes −118‎ Created page with "'''图像质量。''' 我们使用第[https://arxiv.org/html/2408.14837v1#S2 2]节中描述的教师强迫设置来测量LPIPS（Zhang 等人，[https://arxiv.org/html/2408.14837v1#bib.bib40 2018]）和PSNR。在该设置中，我们对初始状态进行采样，并根据地面实况的过去观察轨迹预测单帧。在对5个不同级别的2048条随机轨迹进行评估时，我们的模型实现了<math>29.43</math>的PSNR值和<math>0.249</math>的LPIPS值。PSNR..."
curprev 00:2600:26, 9 September 2024‎ Felipefelixarias talk contribs‎ 56,509 bytes −262‎ Created page with "我们使用 Stable Diffusion 1.4 的预训练检查点训练所有仿真模型，解冻所有 U-Net 参数。我们使用的批量大小为 128，恒定学习率为 2e-5，采用无权重衰减的 Adafactor 优化器（Shazeer & Stern，[https://arxiv.org/html/2408.14837v1#bib.bib31 2018]），以及梯度剪切为 1.0。我们将扩散损失参数化更改为 v预测（Salimans & Ho [https://arxiv.org/html/2408.14837v1#bib.bib28 2022a]）。我们以 0.1 的概率去..."
curprev 00:2600:26, 9 September 2024‎ Felipefelixarias talk contribs‎ 56,771 bytes −295‎ Created page with "例如，在游戏 DOOM 中，<math>\mathcal{S}</math> 是程序的动态内存内容，<math>\mathcal{O}</math> 是渲染的屏幕像素，<math>V</math> 是游戏的渲染逻辑，<math>\mathcal{A}</math> 是按键和鼠标移动的集合，而 <math>p</math> 是基于玩家输入的程序逻辑（包括任何潜在的非确定性）。"
curprev 00:2600:26, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,066 bytes −76‎ Created page with "center|thumb|900x900px|图 5：模型预测与地面实况对比。仅显示过去观测上下文的最后 4 帧。"
curprev 00:2600:26, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,142 bytes −242‎ Created page with "=== 5.1 仿真质量 ==="
curprev 00:2500:25, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,384 bytes −219‎ Created page with "==== 3.3.2 去噪器采样步骤 ===="
curprev 00:2500:25, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,603 bytes −210‎ Created page with "一个''交互环境''<math>\mathcal{E}</math>由一个潜在状态空间<math>\mathcal{S}</math>、一个潜在空间的部分投影空间<math>\mathcal{O}</math>、一个部分投影函数<math>V: \mathcal{S} \rightarrow \mathcal{O}</math>、一组动作<math>\mathcal{A}</math>，以及一个转移概率函数<math>p \left( s^{\prime} \,|\, a, s \right)</math>，使得<math>s, s^{\prime} \in \mathcal{S}, a\in \mathcal{A}</math>。"
curprev 00:2500:25, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,813 bytes −58‎ Created page with "=== 4.1 代理训练 ==="
curprev 00:2500:25, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,871 bytes −62‎ Created page with "我们使用DDIM采样（Song等人，[https://arxiv.org/html/2408.14837v1#bib.bib34 2022]）。我们仅对过去观测条件<math>o_{< n}</math>采用了无分类器指导（Ho & Salimans，[https://arxiv.org/html/2408.14837v1#bib.bib12 2022]）。我们发现对过去动作条件<math>a_{< n}</math>的指导无法提高质量。我们使用的权重相对较小（1.5），因为较大的权重会产生伪影，而我们的自动回归采样则会放大这些伪影。"
curprev 00:2400:24, 9 September 2024‎ Felipefelixarias talk contribs‎ 57,933 bytes −213‎ Created page with "在推理过程中，我们需要运行 U-Net 去噪器（进行若干步）和自动编码器。在我们的硬件配置（TPU-v5）下，一次去噪步骤和自动编码器的评估各需 10 毫秒。如果我们以单步去噪器运行模型，设置中的最小总延迟为每帧 20 毫秒，即每秒 50 帧。通常情况下，生成扩散模型（如 Stable Diffusion）通过单次去噪步骤无法产生高质量结果，而是需要数十个采样步骤才能生..."
curprev 00:2400:24, 9 September 2024‎ Felipefelixarias talk contribs‎ 58,146 bytes −73‎ Created page with "==== 3.2.1 使用噪声增强缓解自回归漂移 ===="
curprev 00:2400:24, 9 September 2024‎ Felipefelixarias talk contribs‎ 58,219 bytes −385‎ Created page with "一个实时运行的神经模型是否能够以高质量模拟复杂的游戏？"
curprev 00:2300:23, 9 September 2024‎ Felipefelixarias talk contribs‎ 58,604 bytes −465‎ Created page with "给定输入交互环境 <math>\mathcal{E}</math> 和初始状态 <math>s_{0} \in \mathcal{S}</math>，一个“交互世界模拟”是一个“模拟分布函数” <math>q \left( o_{n} \,|\, \{o_{< n}, a_{\leq n}\} \right), \; o_{i} \in \mathcal{O}, \; a_{i} \in \mathcal{A}</math>。给定观测值之间的距离度量 <math>D: \mathcal{O} \times \mathcal{O} \rightarrow \mathbb{R}</math>，一个“策略”，即给定过去动作和观测的代理动作分布 <math>..."
curprev 00:2200:22, 9 September 2024‎ Felipefelixarias talk contribs‎ 59,069 bytes −244‎ Created page with "center|thumb|900x900px|图3：GameNGen方法概览。为了简洁起见，省略了v预测的详细信息。"
curprev 00:2100:21, 9 September 2024‎ Felipefelixarias talk contribs‎ 59,313 bytes −262‎ Created page with "我们在整个训练过程中记录了代理的训练轨迹，其中涵盖了不同技能水平的游戏。这组记录的轨迹构成了我们的<math>\mathcal{T}_{agent}</math>数据集，用于训练生成模型（见第[https://arxiv.org/html/2408.14837v1#S3.SS2 3.2]节）。"
curprev 00:2100:21, 9 September 2024‎ Felipefelixarias talk contribs‎ 59,575 bytes −392‎ Created page with "现在，我们训练一个生成扩散模型，该模型以在前一阶段收集的代理轨迹<math>\mathcal{T}_{agent}</math>（行动和观察）作为条件。"
curprev 00:2000:20, 9 September 2024‎ Felipefelixarias talk contribs‎ 59,967 bytes −135‎ Created page with "GameNGen（发音为“游戏引擎”）是一个生成扩散模型，它能够在第[https://arxiv.org/html/2408.14837v1#S2 2]节的设置下学习模拟游戏。为了收集该模型的训练数据，我们首先使用教师强制目标训练一个独立的模型与环境进行交互。这两个模型（代理和生成模型）依次进行训练。在训练过程中，代理的全部行为和观察语料 <math>\mathcal{T}_{agent}</math> 被保留下来，并在第二..."
curprev 00:2000:20, 9 September 2024‎ Felipefelixarias talk contribs‎ 60,102 bytes −56‎ Created page with "== 2 互动世界仿真 =="
curprev 00:2000:20, 9 September 2024‎ Felipefelixarias talk contribs‎ 60,158 bytes −574‎ Created page with "我们总是使用教师强迫目标来训练我们的生成模型。给定一个模拟分布函数 <math>q</math>，可以通过自回归地采样观测值来模拟环境 <math>\mathcal{E}</math>。"
curprev 00:1900:19, 9 September 2024‎ Felipefelixarias talk contribs‎ 60,732 bytes −66‎ Created page with "center|thumb|800x800px|图 2：GameNGen 与之前最先进的 DOOM 仿真的比较"
curprev 00:1900:19, 9 September 2024‎ Felipefelixarias talk contribs‎ 60,798 bytes −229‎ Created page with "== 1 介绍 =="
curprev 00:1900:19, 9 September 2024‎ Felipefelixarias talk contribs‎ 61,027 bytes −247‎ Created page with "在这项工作中，我们证明答案是肯定的。具体来说，我们展示了一款复杂的视频游戏——标志性游戏《DOOM》，可以在神经网络（开放式 Stable Diffusion v1.4 的增强版（Rombach 等人，[https://arxiv.org/html/2408.14837v1#bib.bib26 2022]））上实时运行，同时获得与原始游戏相当的视觉质量。尽管这不是精确仿真，该神经模型能够执行复杂的游戏状态更新，例如统计生命值和弹..."
curprev 00:1800:18, 9 September 2024‎ Felipefelixarias talk contribs‎ 61,274 bytes −213‎ Created page with "近年来，生成模型在根据文本或图像等多模态输入生成图像和视频方面取得了重大进展。在这一浪潮的前沿，扩散模型成为非语言媒体生成的事实标准，如 Dall-E（Ramesh 等人，[https://arxiv.org/html/2408.14837v1#bib.bib25 2022]）、Stable Diffusion（Rombach 等人，[https://arxiv.org/html/2408.14837v1#bib.bib26 2022]）和 Sora（Brooks 等人，[https://arxiv.org/html/2408.14837v1#bib.bib6 2024]）。乍一看，..."
curprev 00:1800:18, 9 September 2024‎ Felipefelixarias talk contribs‎ 61,487 bytes −259‎ Created page with "计算机游戏是围绕以下“游戏循环”手动制作的软件系统：(1) 收集用户输入，(2) 更新游戏状态，(3) 将其渲染为屏幕像素。这个游戏循环以很高的帧率运行，为玩家营造出一个交互式虚拟世界的假象。这种游戏循环通常在标准计算机上运行，尽管也有许多在定制硬件上运行游戏的惊人尝试（例如，标志性游戏《毁灭战士》曾在烤面包机、微波炉、跑步机、照..."
curprev 00:1800:18, 9 September 2024‎ Felipefelixarias talk contribs‎ 61,746 bytes −200‎ Created page with "我们介绍了''GameNGen''，这是第一个完全由神经模型驱动的游戏引擎，能够在长轨迹上与复杂环境进行高质量的实时交互。GameNGen 可以在单个 TPU 上以每秒超过 20 帧的速度交互模拟经典游戏 DOOM。下一帧预测的 PSNR 为 29.4，与有损 JPEG 压缩相当。在区分游戏短片和模拟片段方面，人类评分员的表现仅略好于随机概率。GameNGen 的训练分为两个阶段：(1) 一个强化学..."
curprev 00:1800:18, 9 September 2024‎ Felipefelixarias talk contribs‎ 61,946 bytes −58‎ Created page with "====== 摘要 ======"
curprev 00:0400:04, 9 September 2024‎ Felipefelixarias talk contribs‎ 62,004 bytes +62,004‎ Created page with "'''项目网站：''' [https://gamengen.github.io/ https://gamengen.github.io]"

9 September 2024