当上一次通用大模型彻底碾压所有前代模型是什么时候?

TL;DR · AI 摘要
Gary Marcus质疑GPT-4相比GPT-3.5是否真有突破,认为其仅为渐进式改进,行业存在夸大宣传。
核心要点
- GPT-4相较GPT-3.5属增量改进,无真实护城河
- Anthropic对Mythos模型的宣传被指过度夸大
- 当前LLM发展缺乏真正意义上的范式突破
结构提纲
按章节快速跳转。
Gary Marcus提出问题:自GPT-4以来,是否有通用大模型实现对前代模型的全面超越。
GPT-4相较于GPT-3.5的进步被定义为渐进式变化,缺乏真正的技术壁垒。
有用户指出Anthropic对Mythos的描述严重夸大,实际性能仅优于Opus但未达宣传水平。
当前大模型发展呈现高调宣传与实质性突破脱节的现象。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- 大模型突破性进展的质疑
- GPT-4 vs GPT-3.5:增量演进
- 无本质差异
- 缺乏技术护城河
- Anthropic Mythos模型争议
- 宣传过度夸大
- 实际性能仅优于Opus
- 行业现状: hype 与现实脱节
- 缺乏范式突破
- 以工程优化为主
金句 / Highlights
值得收藏与分享的关键句。
GPT-4相对GPT 3.5?那正是增量改进且无真实护城河的表现。
Anthropic对Mythos的描述明显夸大,其性能虽优于Opus,但远未达到‘网络武器’级别的能力。
当前通用大模型的发展缺乏真正意义上的范式突破,更多是工程优化而非原理革新。
GPT-4 relative to GPT 3.5?
That’s what incremental change with no real moat looks like." / X
Gary Marcus on X: "When is the last time a general purpose LLM (putting aside hybrid systems like Claude Code with special purpose symbolic harnesses) last completely blew away all competing prior models? GPT-4 relative to GPT 3.5? That’s what incremental change with no real moat looks like." / X
Don’t miss what’s happening

When is the last time a general purpose LLM (putting aside hybrid systems like Claude Code with special purpose symbolic harnesses) last completely blew away all competing prior models? GPT-4 relative to GPT 3.5? That’s what incremental change with no real moat looks like.
Quote

Haider.

@haider1
·
9h
mythos is pretty on par with gpt-5.5 and while gpt-5.5 is currently SOTA, it's not anything like what anthropic describes mythos as it's pretty obvious that anthropic is overhyping the model -- yes, it is better than opus, but it's not some cyberweapon that they describe it as
·
13
1
22
3
Read 13 replies