TokenSpeed 是一个专为光速代理工作负载打造的全新推理引擎

TL;DR · AI 摘要
TokenSpeed 是一个专为代理型工作负载优化的新型开源 LLM 推理引擎,具备高性能 KV 缓存管理、高效调度器和跨芯片支持的模块化内核架构。
核心要点
- TokenSpeed 实现了媲美 TensorRT-LLM 的性能与接近 vLLM 的易用性。
- 其插件式分层内核设计支持多硅片平台,具备良好可扩展性。
- 项目由轻量团队在两个月内完成,采用 MIT 开源许可。
结构提纲
按章节快速跳转。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- TokenSpeed 推理引擎
- 核心技术
- KV 缓存优化
- 安全高效调度器
- 分层插件内核
- 性能表现
- MLA 最快注意力内核
- Blackwell 架构优化
- 生态与部署
- 多硅片支持
- MIT 开源许可
金句 / Highlights
值得收藏与分享的关键句。
TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads.
It has the fastest MLA attention kernel on NVIDIA Blackwell.
Built by a lean and mission-driven team in two months > MIT license, open-source
Advanced KV cache management, safe and efficient scheduler, pluggable layered kernel system.
TensorRT LLM level performance, vLLM level usability.
Designed for multi-silicon support, enabling broad hardware compatibility.
Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it" / X

TokenSpeed is a brand new inference engine purpose built for speed-of-light agentic workloads. Read their blog to learn more about its advanced KV cache management, safe and efficient scheduler, and pluggable layered kernel system designed for multi-silicon support. Plus, it also has the fastest MLA attention kernel on NVIDIA Blackwell. Congrats to
on the launch!
Quote

@lightseekorg
7h
Introducing TokenSpeed, a speed-of-light LLM inference engine. > TensorRT LLM level performance > vLLM level usability > Built by a lean and mission-driven team in two months > MIT license, open-source github.com/lightseekorg/t lightseek.org/blog/lightseek