TokenSpeed 是一个专为光速代理工作负载打造的全新推理引擎

Q: 团队与开发效率

小型团队两月内完成高性能系统开发。

NVIDIA AI(@NVIDIAAI)

NVIDIA AI(@NVIDIAAI)2026年5月6日

TokenSpeed 是一个专为光速代理工作负载打造的全新推理引擎

7.2Score

TL;DR · AI 摘要

TokenSpeed 是一个专为代理型工作负载优化的新型开源 LLM 推理引擎，具备高性能 KV 缓存管理、高效调度器和跨芯片支持的模块化内核架构。

核心要点

TokenSpeed 实现了媲美 TensorRT-LLM 的性能与接近 vLLM 的易用性。
其插件式分层内核设计支持多硅片平台，具备良好可扩展性。
项目由轻量团队在两个月内完成，采用 MIT 开源许可。

结构提纲

按章节快速跳转。

§引言
介绍 TokenSpeed 作为新型推理引擎的定位与目标场景。
·核心特性
涵盖 KV 缓存优化、安全调度器与分层内核系统。
·性能优势
在 Blackwell 架构上实现最快 MLA 注意力计算。
›开源与生态
MIT 许可，GitHub 开源，强调社区驱动发展。
›团队与开发效率
小型团队两月内完成高性能系统开发。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

TokenSpeed 推理引擎
- 核心技术
  - KV 缓存优化
  - 安全高效调度器
  - 分层插件内核
- 性能表现
  - MLA 最快注意力内核
  - Blackwell 架构优化
- 生态与部署
  - 多硅片支持
  - MIT 开源许可

金句 / Highlights

值得收藏与分享的关键句。

TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads.
— Post
⬇︎ 下载 PNG 𝕏 分享到 X
It has the fastest MLA attention kernel on NVIDIA Blackwell.
— Post
⬇︎ 下载 PNG 𝕏 分享到 X
Built by a lean and mission-driven team in two months > MIT license, open-source
— Quote
⬇︎ 下载 PNG 𝕏 分享到 X
Advanced KV cache management, safe and efficient scheduler, pluggable layered kernel system.
— Post
⬇︎ 下载 PNG 𝕏 分享到 X
TensorRT LLM level performance, vLLM level usability.
— Quote
⬇︎ 下载 PNG 𝕏 分享到 X
Designed for multi-silicon support, enabling broad hardware compatibility.
— Post
⬇︎ 下载 PNG 𝕏 分享到 X

#LLM推理#NVIDIA#开源#KV缓存#注意力机制

打开原文

Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it" / X

TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads. Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it also has the fastest MLA attention kernel on NVIDIA Blackwell. Congrats to

on the launch!

Quote

LightSeek Foundation

@lightseekorg

7h

Introducing TokenSpeed, a speed-of-light LLM inference engine. > TensorRT LLM level performance > vLLM level usability > Built by a lean and mission-driven team in two months > MIT license, open-source github.com/lightseekorg/t lightseek.org/blog/lightseek