https://t.co/T8MuoEG48b

Q: 问题背景

代码由代理编写，导致大量未审PR堆积。

Augment Code(@augmentcode)

Augment Code(@augmentcode)2026年5月8日

https://t.co/T8MuoEG48b

8.5Score

TL;DR · AI 摘要

Augment Code通过使用Cosmos系统重建代码审查流程，显著提高了代码输出和合并速度，同时降低了错误率。

核心要点

代码输出增长3倍，中位合并时间减少。
PR自动批准低风险更改，仅在必要时拉入人工评审。
错误率每单位输出呈下降趋势。

结构提纲

按章节快速跳转。

§引言
Augment Code面临代码审查瓶颈，决定重建代码审查流程。
·问题背景
代码由代理编写，导致大量未审PR堆积。
·解决方案
引入Cosmos系统，自动化处理低风险PR并减少人工评审工作量。
·效果评估
代码输出增加，合并时间减少，错误率降低。

思维导图

用一张图看清主题之间的关系。

查看大纲文本（无障碍 / 无 JS 友好）

Augment Code重建代码审查流程

金句 / Highlights

值得收藏与分享的关键句。

代码输出增长3倍，中位合并时间减少。
— 第 2 段
⬇︎ 下载 PNG 𝕏 分享到 X
PR自动批准低风险更改，仅在必要时拉入人工评审。
— 第 4 段
⬇︎ 下载 PNG 𝕏 分享到 X
错误率每单位输出呈下降趋势。
— 第 2 段
⬇︎ 下载 PNG 𝕏 分享到 X

#Augment Code#代码审查#AI工具

打开原文

Article

More code, faster reviews: how we rebuilt code review at Augment using Cosmos

TL;DR. When agents write 100% of your code, the bottleneck moves to review: at one point we had 1,400 open PRs and a 20-hour median time-to-first-comment. We rebuilt the review process on

as a team of agents that auto-approves low-risk changes, runs line-by-line correctness analysis, and pulls humans in only for the calls that need judgment. Since January, code output is up 3x, median merge time has dropped, and bug rate per output unit is trending down.

Teams that go all-in on AI coding tools hit the same wall. Raw code output grows exponentially, and PRs pile up in the review queue. Someone's whole job could be reviewing PRs and it would still make no dent. Some companies solve this by rubber stamping PRs to keep moving fast, but shipping faster with bugs and uncontrolled tech debt is a one-way road to chaos.

Here is how we solved our code review bottleneck and started merging PRs at the same rapid rate that we were generating them. We did this without compromising on the quality of the software nor our reviewers’ understanding of the PRs they reviewed. This was a fundamental rethink of how code reviews are done. Let's dig in.

We hit the code review wall at Augment in January. With 100% of our code being written by agents, PRs generated shot up, but so did the median merge time (i.e. PR latency). PR merge rate (i.e. PR throughput) went up, but not at the pace at which they were being generated. PRs started piling up in the review queue and there were over 1,400 open PRs at one point. We had a real problem.

Our median time-to-first-human-comment was hovering around 1,200 minutes. That's 20 hours before an engineer even looked at your PR. This wasn't a reviewer problem. They had six PRs ahead of yours, each 400 lines of code they didn't write. Our original AI code review tool ran in 3-5 minutes and caught real bugs, but a human still had to read every line of every PR. A two-line change ended up waiting a day at the bottom of the queue.

Our VP of Engineering called it out:

: a human reviewer needed to read and reason about every single line of code to gain confidence in the quality of what was being shipped, and develop an understanding of the system being built. This was what we had to solve.

A couple of months ago, Augment internally rolled out

, our operating system for agentic software development: agents that run anywhere, work across your SDLC, with humans steering where judgment matters. It is purpose built for automating workflows, with several out-of-the-box feature for teams like shared context and memory, self-improving agent loops, connections to all of your tools, etc. Each Cosmos automation comes in the form of an Expert, which has its own prompt, integrations, environment, secrets, event triggers, subscriptions, worker experts and much more. Code review was naturally the first automation we went after.

The figure below highlights our new code review process: a team of Experts drives the code review process and pulls in the human only when needed. It splits the code review process into four coordinated loops:change execution (PR Author), risk analysis (PR Risk Analyzer), correctness (Bug Reviewer), and system design judgment (Intent Reviewer + human), all continuously improving via shared memory.

PR Risk Analyzer

Evaluates the risk for every new PR automatically and routes it appropriately.

Low-risk changes(docs, configs, trivial edits) → automatically approved with justification**

Higher-risk changes→ tagged with specific review dimensions (e.g. architecture, security) that need human input

👉 Removes low-value queue work and ensures humans only see what requires judgment. Initially, we saw 10% of PRs automatically approved, but this number goes up once developers start providing feedback on what type of PRs are low-risk.

**

to understand how to maintain SOC-II compliance with agent approved PRs.

Intent Reviewer (Interactive)

Owns the review process end-to-end and engages the human only when needed.

Breaks the review into structured phases (design, risk, correctness, etc.)

Guides the human through decisions instead of requiring full code diff review

Posts finalized comments back to GitHub

This is the only part of the code review process requiring human input

👉 Shifts humans from line-by-line reviewers to decision-makers.

Deep Code Review

Performs deep, line-by-line correctness analysis focused purely on objective bugs

This is the component most similar to a standalone AI code review tool (Akshay wrote a separate post on

.)

👉 Catches the vast majority of high and medium severity issues with very high recall.

PR Author

Owns the execution loop of the PR lifecycle.

Given a feature request in a ticket or specification, it implements the feature and opens a PR

Automatically responds to review comments, fixes CI failures, resolves merge conflicts, and puts up subsequent commits

After providing a ticket link or specification, the human developer only needs to come in to give the final merge decision

👉 Removes the author-side bottleneck and keeps PRs moving without manual intervention.

Memory Manager

Learns from every PR to continuously improve the system.

Captures human feedback from merged PRs - human comments, replies to bot comments, thumbs up/down emojis and sessions with the Intent Reviewer - and distills it into a structured, per-repo knowledge base that all experts ingest before starting their work

A deep-dive into the memory system will be discussed in an subsequent blog

👉 Ensures the system gets better to match your company’s specific practices or preferences so that over time less and less human intervention is needed.

Our

tell the story: while code output at Augment has gone up over 3x since January, median merge time has actually decreased.

Bugs introduced have been steady over time even though we've been pushing significantly more code. The raw count didn't spike which many would expect. Bugs per output unit is tapering down. Quality is maintained.

Weekly revert rate is healthy - we aim for it to be 1.5%, and we hover around +/- 0.5%.

Finally there are two effects that we can’t capture in numbers; in spite of shorter review times:

Humans are still driving high-level system design because they have better organizational and business context.

Reviewers continue to get the knowledge transfer benefit of code review.

If AI has spiked the volume of code generated, the goal isn’t just “faster reviews.” It’s building a review system that scales: automate low-level correctness, reserve humans for judgment calls, and eliminate low-risk queue work. You can do all of this using Augment’s Cosmos Platform, which is in

. Just prompt the platform’s Cosmos Advisor Expert saying “Set up the code review automation fleet for me” and it will build up this suite of code review experts for you and help you customize them to your unique requirements and setup.

After solving the code review bottleneck at Augment, we’ve moved on to automating our subsequent SDLC bottlenecks internally, including end-to-end testing, incident response, feedback triage, and ticket management, all deployed on the

. Stay tuned for upcoming blogs about those.

This post was written by Akshay Utture and Will Colbert, and was originally published on the Augment Code blog: