SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell. 

Good to see f...

NVIDIA AI(@NVIDIAAI)

NVIDIA AI(@NVIDIAAI)2026年4月30日

SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell. Good to see f...

7.0Score

TL;DR · AI 摘要

NVIDIA AI 报告称，SGLang 在 Blackwell 硬件上使用 DeepSeek-V4 模型解码达到 180 tok/s/GPU 的速度，约 1M 上下文，得益于 LMSYS 组织针对 Blackwell 的特定优化，提高了混合稀疏注意力的利用效率。

核心要点

SGLang 在 DeepSeek-V4 解码任务上实现高性能，达 180 tok/s/GPU。
该成果基于 Blackwell 硬件与 LMSYS 优化，提升模型稀疏注意力性能。
LMSYS 同时发布适用于 V4 的 Miles RL 训练管道，支持 Day 0 优化。

结构提纲

按章节快速跳转。

§引言
NVIDIA AI 宣布 SGLang 在新硬件上的性能突破。
·性能亮点
介绍 SGLang 达到的具体性能指标及上下文大小。
·优化来源
提及 LMSYS 对 Blackwell 硬件的特定优化贡献。
·额外更新
LMSYS 发布的配套工具和训练管道简介。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

SGLang & DeepSeek-V4 性能突破
- 性能表现
  - 180 tok/s/GPU
- 优化因素
  - Blackwell硬件
  - LMSYS组织
- 附加成果
  - Miles RL训练管道

金句 / Highlights

值得收藏与分享的关键句。

SGLang 在 DeepSeek-V4 解码上达到 180 tok/s/GPU，上下文约 1M。
⬇︎ 下载 PNG 𝕏 分享到 X
LMSYS 针对 Blackwell 的优化提升了模型的混合稀疏注意力利用率。
⬇︎ 下载 PNG 𝕏 分享到 X
伴随 V4 发布，LMSYS 提供了 Miles 中的 RL 训练管道，支持 Day 0 优化。
⬇︎ 下载 PNG 𝕏 分享到 X

#NVIDIA#DeepSeek-V4#SGLang#Blackwell#LMSYS

打开原文

Good to see fast progress in open source DeepSeek-V4 inference on new hardware.

This comes from Blackwell-specific optimizations by @lmsysorg that better use the model’s hybrid sparse" / X

Don’t miss what’s happening

SGLang is hitting 180 tok/s/GPU on DeepSeek-V4 decode with ~1M context on Blackwell. Good to see fast progress in open source DeepSeek-V4 inference on new hardware. This comes from Blackwell-specific optimizations by

that better use the model’s hybrid sparse attention.

Quote

LMSYS Org

@lmsysorg

Apr 24

DeepSeek V4 by @deepseek_ai just dropped! SGLang is ready on Day 0 with a full stack of optimizations from architectures to low-level kernels. We also deliver a verified RL training pipeline in Miles (by @radixark) for V4 at launch: Image 2: 1️⃣ Native "ShadowRadix" Design: DeepSeek V4's

Sign up now to get your own personalized timeline!