GB 200s 改变了大型 MoE 模型如 Qwen 的预填充和解码分离方式

Aravind Srinivas(@AravSrinivas)

Aravind Srinivas(@AravSrinivas)2026年5月12日

GB 200s 改变了大型 MoE 模型如 Qwen 的预填充和解码分离方式

8.5Score

TL;DR · AI 摘要

GB 200s 提高了大型 MoE 模型如 Qwen 的预填充和解码分离效率，相比 Hopper 平台，吞吐量显著提升。

核心要点

GB 200s 在高吞吐量推理方面比 Hopper 更适合大型 MoE 模型。
Perplexity 发布了关于在 NVIDIA GB200 上部署 Qwen3 235B 模型的研究。
GB 200s 不仅是训练平台，也是高性能推理平台。

结构提纲

按章节快速跳转。

§引言
介绍 GB 200s 对大型 MoE 模型的影响。
·预填充和解码分离
GB 200s 如何改变预填充和解码分离的过程。
·性能对比
GB 200s 与 Hopper 在吞吐量上的对比。
·研究发布
Perplexity 发布了关于在 GB 200s 上部署 Qwen3 235B 模型的研究。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

GB 200s 与 MoE 模型
- 预填充和解码分离
- 性能对比
- 研究发布

金句 / Highlights

值得收藏与分享的关键句。

GB 200s 改变了大型 MoE 模型如 Qwen 的预填充和解码分离方式。
— 第 1 段
⬇︎ 下载 PNG 𝕏 分享到 X
GB 200s 在高吞吐量推理方面比 Hopper 更适合大型 MoE 模型，不仅是一个训练平台。
— 第 2 段
⬇︎ 下载 PNG 𝕏 分享到 X
我们发布了关于在 NVIDIA GB200 上部署 Qwen3 235B 模型的新研究。
— 第 2 段
⬇︎ 下载 PNG 𝕏 分享到 X

#NVIDIA#MoE#Qwen#Hopper#GB 200

打开原文

Aravind Srinivas on X: "GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers." / X

Don’t miss what’s happening

Aravind Srinivas ![Image 4](https://x.com/AravSrinivas)

@AravSrinivas

GB 200s change how one does the prefill and decode disaggregation when serving large MoEs like Qwen. We’ve published details of our stack quantifying the throughput benefits compared to serving on Hoppers.

Quote

Perplexity

@perplexity_ai

·

10h

We published new research on how we serve post-trained Qwen3 235B models on NVIDIA GB200 NVL72 Blackwell racks. GB200 is a major step up over Hopper for high-throughput inference on large MoE models, not just a training platform.

2:27 PM · May 12, 2026

·

22.9K Views

11

13

164

53

Read 11 replies