Jump to content

Translations:Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer/33/zh

From Marovi AI

Multi-gate Mixture-of-Experts —— 用于多任务学习的任务条件 MoE。
Attention Is All You Need —— Transformer 架构，后由 GShard 和 Switch Transformer 在其中插入 MoE 层。
Dropout: A Simple Way to Prevent Neural Networks from Overfitting —— 一种相关的随机条件计算形式。
Language Models are Few-Shot Learners —— 大型稠密语言模型，MoE 扩展工作的目标即是以更低成本超越此类模型。

Retrieved from "https://marovi.ai/index.php?title=Translations:Outrageously_Large_Neural_Networks:_The_Sparsely-Gated_Mixture-of-Experts_Layer/33/zh&oldid=9974"