Fast, faster, Qwen. 🚀 Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workl...

TL;DR · AI 摘要
Qwen3.5 达到 580 tps 的记录性突破,得益于 TokenSpeed 引擎和合作伙伴的优化。
核心要点
- Qwen3.5 在 TokenSpeed 引擎上实现 580 tps 的性能。
- FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
- 该成就推动了开源大语言模型推理的边界。
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- Qwen3.5 性能突破
金句 / Highlights
值得收藏与分享的关键句。
Qwen3.5 达到 580 tps 的记录性突破。
FA4 优化由 Lightseek、NVIDIA、Mooncake 和 Tri Dao 提供。
该成就推动了开源大语言模型推理的边界。
Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners.
Huge thanks to @lightseekorg, @NVIDIAAI, the Mooncake team, and @tri_dao for" / X

Fast, faster, Qwen. Thrilled to see Qwen3.5 reaching a record-breaking 580 tps for agentic workloads on the TokenSpeed engine! This milestone wouldn't be possible without our incredible partners. Huge thanks to
,
, the Mooncake team, and
for the pioneering FA4 optimization. Together, we are pushing the boundaries of open-source LLM inference. Dive into the full
blog post below! pytorch.org/blog/up-to-580#Qwen#Qwen3_5#TokenSpeed#LLM#Inference#AI#PyTorch#OpenSource#AgenticAI#HighPerformance
Quote
PyTorch
@PyTorch
7h
The speed-of-light optimization for Qwen3.5 on the TokenSpeed inference engine is a significant milestone, achieving a record-breaking 580 tokens per second (tps) for agentic workloads on NVIDIA GPUs. In the PyTorch Foundation's latest community blog post, you can learn all