产品

Lean-IMO-Bench

Q: Lean-IMO-Bench 最近有什么新动态？

traeai 已收录 1 篇与 Lean-IMO-Bench 相关的内容。最新一篇是「New research from Google. Just shows the impressive results you can get from custom agent harnesses...」，由 elvis(@omarsar0) 发布。

用于评估数学证明能力的基准数据集，LEAP 将其一次求解率从<10%提升至70%。

已跟踪 1 条高相关材料

TraeAI 观察

如果只读 3 篇

New research from Google. Just shows the impressive results you can get from custom agent harnesses...

elvis(@omarsar0) · 8.8 分

Google 的 LEAP 框架以通用 LLM 为核心，结合 Lean 编译器与验证器反馈，将 Lean-IMO-Bench 一次求解率从低于10%提升至70%，并一模型解决全部 Putnam 2025 题目，超越专门系统48分。

Google 新研究：LEAP 框架实现通用 LLM 在数学证明中的高效求解

elvis(@omarsar0)6月4日144 字 (约 1 分钟)

Google 的 LEAP 框架将通用 LLM 与形式化数学编译器 Lean 及验证器结合，使 Lean-IMO-Bench 一次求解率从低于10%提升至70%，并以一模型解决 Putnam 2025 全部 12 题，超越专门化金牌系统。

入选理由：LEAP 通用 LLM 一模型解决全部 12 道 Putnam 2025 题。

精选推文#LEAP#Lean 编译器#Putnam 2025#agentic 框架#通用 LLM英文

跨材料问答 · Lean-IMO-Bench

回答基于：Lean-IMO-Bench 相关 1 条材料