Claude Opus 4.7 just implemented an AlphaZero-style self-play pipeline from scratch. It did this on...

TL;DR · AI 摘要
Claude Opus 4.7 在消费级硬件上三小时内从零实现 AlphaZero 风格自博弈管道,7/8 胜 Pascal Pons 连四求解器,首次验证大模型可自主构建完整 ML 系统。
核心要点
- Claude Opus 4.7 首次在无预置代码前提下,自主实现含 MCTS、神经策略/价值网络、自博弈与训练调度的 AlphaZero 全栈系统。
- 该能力在仅 3 小时、消费级硬件(如笔记本)上完成,远超其他前沿编码智能体(最高仅 2/8 通关)。
- 论文提出新评估范式:以极简任务描述+严格资源约束,测试模型端到端重建经典 ML 突破的能力。
结构提纲
按章节快速跳转。
Claude Opus 4.7 首次在消费硬件上从零构建并运行完整 AlphaZero 管道。
提出‘重建经典 ML 系统’新基准,替代传统补丁/单元测试评测方式。
涵盖 MCTS、神经策略/价值网络、自博弈循环与训练调度等全栈组件。
- ›性能对比
7/8 击败 Pascal Pons 连四求解器,其余前沿编码智能体均未超 2/8。
全程在消费级硬件运行,耗时仅三小时,验证工程落地潜力。
思维导图
用一张图看清主题之间的关系。
查看大纲文本(无障碍 / 无 JS 友好)
- Claude Opus 4.7 自博弈突破
- 技术实现
- MCTS 搜索
- 神经策略/价值网络
- 自博弈训练循环
- 评估范式
- 重建经典 ML 系统
- 极简描述 + 紧约束预算
- 端到端系统构建能力
- 实证结果
- 7/8 击败 Pascal Pons
- 3 小时消费硬件完成
- 超越所有已测前沿编码智能体
金句 / Highlights
值得收藏与分享的关键句。
Claude Opus 4.7 just implemented an AlphaZero-style self-play pipeline from scratch.
This shifts the bar to 'can the agent build a non-trivial ML system end-to-end on its own?'
Connect Four + AlphaZero is the first instance. It's small enough to run on a laptop and hard enough to require a real research engineering loop.
It did this on consumer hardware in three hours, then beat the Pascal Pons solver 7 of 8 as first-mover on Connect Four.
No other frontier coding agent tested cleared 2 of 8.
This paper https://t.co/DP1QKVehxQ" / X
Claude Opus 4.7 just implemented an AlphaZero-style self-play pipeline from scratch. It did this on consumer hardware in three hours, then beat the Pascal Pons solver 7 of 8 as first-mover on Connect Four. No other frontier coding agent tested cleared 2 of 8. This paper proposes a new way to evaluate coding agents: hand them a minimal task description, give them a tight budget, and ask them to autonomously rebuild a famous ML breakthrough. Connect Four + AlphaZero is the first instance. It's small enough to run on a laptop and hard enough to require a real research engineering loop (MCTS, neural value/policy nets, self-play, training schedule). We've been measuring coding agents on patches and unit tests. This shifts the bar to "can the agent build a non-trivial ML system end-to-end on its own?" The answer is now yes for at least one frontier model. Paper: arxiv.org/abs/2604.25067 Learn to build effective AI agents in our academy: academy.dair.ai