This could solve the main issue with context windows

Because this new model has a context window of...

Paul Couvert(@itsPaulAi)

Paul Couvert(@itsPaulAi)2026年5月5日

This could solve the main issue with context windows Because this new model has a context window of...

5.2Score

TL;DR · AI 摘要

推文宣称新模型SubQ实现1200万token上下文窗口、98%准确率，速度提升52倍且成本仅Opus 4.7的5%，但未提供技术细节、评测方法或可验证数据。

核心要点

SubQ声称支持12M token超长上下文，仍保持98%准确率
相比Opus 4.7，推理速度快52倍、成本降至5%
采用全亚二次稀疏注意力（SSA）架构，属首款此类前沿模型

结构提纲

按章节快速跳转。

§核心主张
SubQ模型以1200万token上下文窗口解决长上下文瓶颈。
·性能对比
相较Opus 4.7，速度+52×、成本-95%、准确率维持98%。
·架构创新
首次采用全亚二次稀疏注意力（SSA）架构的前沿大模型。
›可信度存疑点
无论文、基准、硬件配置或复现信息，属营销式官宣。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

SubQ：12M token上下文模型
- 核心指标
  - 1200万token上下文
  - 98%准确率
  - 52×更快，5%成本
- 技术架构
  - 全亚二次稀疏注意力（SSA）
  - 首个SSA前沿大模型

金句 / Highlights

值得收藏与分享的关键句。

This could solve the main issue with context windows
— 原文标题
⬇︎ 下载 PNG 𝕏 分享到 X
12M tokens (!!) but still maintains 98% accuracy
— 原文正文
⬇︎ 下载 PNG 𝕏 分享到 X
first model built on a fully sub-quadratic sparse-attention architecture (SSA)
— Alexander Whedon 引述
⬇︎ 下载 PNG 𝕏 分享到 X

#LLM#context window#sparse attention#SubQ

打开原文

Because this new model has a context window of 12M tokens (!!) but still maintains 98% accuracy

And compared to Opus 4.7, it's:

52 times faster
Costs 5% of the price

That's really impressive." / X

Paul Couvert on X: "This could solve the main issue with context windows Because this new model has a context window of 12M tokens (!!) but still maintains 98% accuracy And compared to Opus 4.7, it's: - 52 times faster - Costs 5% of the price That's really impressive." / X

Don’t miss what’s happening

Paul Couvert

@itsPaulAi

This could solve the main issue with context windows Because this new model has a context window of 12M tokens (!!) but still maintains 98% accuracy And compared to Opus 4.7, it's: - 52 times faster - Costs 5% of the price That's really impressive.

Quote

Alexander Whedon

@alex_whedon

·

15h

Introducing SubQ - a major breakthrough in LLM intelligence. It is the first model built on a fully sub-quadratic sparse-attention architecture (SSA), And the first frontier model with a 12 million token context window which is: - 52x faster than FlashAttention at 1MM tokens -

1:24

Paid partnership

2:14 PM · May 5, 2026

·

9,861 Views

8

10

67

25

Read 8 replies