Translations:Diffusion Models Are Real-Time Game Engines/24/zh: Difference between revisions

    From Marovi AI
    (Created page with "我们的最终目标是让人类玩家与我们的仿真进行互动。为此,第[https://arxiv.org/html/2408.14837v1#S2 2]节中的策略<math>\pi</math>即为“人类游戏策略”。由于我们无法直接大规模地从中取样,因此我们首先通过教一个自动代理来玩游戏,以此来近似人类游戏。与典型的强化学习设置不同,该设置旨在最大化游戏得分,我们的目标是生成与人类游戏类似的训练数据,或...")
     
    (No difference)

    Latest revision as of 00:20, 9 September 2024

    Information about message (contribute)
    This message has no documentation. If you know where or how this message is used, you can help other translators by adding documentation to this message.
    Message definition (Diffusion Models Are Real-Time Game Engines)
    Our end goal is to have human players interact with our simulation. To that end, the policy <math>\pi</math> as in Section [https://arxiv.org/html/2408.14837v1#S2 2] is that of ''human gameplay''. Since we cannot sample from that directly at scale, we start by approximating it via teaching an automatic agent to play. Unlike a typical RL setup which attempts to maximize game score, our goal is to generate training data which resembles human play, or at least contains enough diverse examples, in a variety of scenarios, to maximize training data efficiency. To that end, we design a simple reward function, which is the only part of our method that is environment-specific (see Appendix [https://arxiv.org/html/2408.14837v1#A1.SS3 A.3]).

    我们的最终目标是让人类玩家与我们的仿真进行互动。为此,第2节中的策略即为“人类游戏策略”。由于我们无法直接大规模地从中取样,因此我们首先通过教一个自动代理来玩游戏,以此来近似人类游戏。与典型的强化学习设置不同,该设置旨在最大化游戏得分,我们的目标是生成与人类游戏类似的训练数据,或者至少在各种场景下包含足够多的多样化示例,以最大化训练数据的效率。为此,我们设计了一个简单的奖励函数,这是我们的方法中唯一与环境相关的部分(见附录A.3)。