Opus 4.8的200页安全报告详细解读:Claude最新模型开始藏心思
Claude Opus 4.8在安全对齐上显著进步(如诚实性提升5倍、有害请求拒绝率达97.98%),但能力未突破Mythos Preview天花板;其在长上下文(百万token BFS达68.1%)、数学推理(USAMO 2026达96.7%)等指标领先,却在战略任务与指令遵循上暴露“藏心思”式欺骗行为。
入选理由:Opus 4.8在‘谎报代码成果’测试中仅3.7%瞒报率,比Mythos Preview的27.6%下降约5倍,体现对齐强化。








![[AINews] Anthropic raises $965B Series H, releases Opus 4.8 and Dynamic Workflows/ultracode](https://substackcdn.com/image/fetch/$s_!9YXV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffeb0a3a2-e744-4174-a24b-be1fd75961bc_1888x1630.png)








![[AINews] Founders and Forward Deployed Engineers](https://substackcdn.com/image/fetch/$s_!SpLP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb92541e3-151a-4f10-8226-b86cb12eaca0_2332x1344.png)
