Research · 2026
The State of Rework 2026
Half of pull requests merge clean. The other half enter a fix-and-review loop that eats most of review capacity. This report tracks that split: where the loop starts, what it costs, and why the feedback driving it is usually right, just late.
Aggregated from twelve weeks of pull request activity across engineering organizations on Baz, spring 2026. Organizations stay anonymous throughout. Findings are reported as shares and multiples of a clean-PR baseline, never as raw or per-customer numbers.
The split
Half of pull requests pay for the other half.
Rework is not a rare event. Half of PRs complete without a single rework cycle, and they need little review effort. The other half absorbs almost all review activity, and the deepest tiers concentrate it further. The severe band is the smallest group on the chart, yet it consumes the most review executions.
Most of that spend is repeat work. Once a PR gets reviewed, every later execution re-reviews code the team already paid to review once. Across the dataset, roughly two thirds of review compute happens after a first review. For the typical organization, repeat reviews make up most of its review load.
The cost curve
Each tier deeper is a step change, not an increment.
A clean PR merges in minutes at the median. One rework cycle multiplies that several times over, and the deepest tier runs hundreds of times slower, which means days on the calendar. Change size climbs the same staircase: even a single cycle comes with a change several times the clean median.
None of this shows up as failed delivery. Merge rates stay level across tiers, and reworked PRs get abandoned no more often than clean ones. The cost hides inside cycle time and review load, not in dropped work. That is how a tax this large stays unnoticed.
The developer loop
Rework behaves like an interruption, not a queue.
The gap between review feedback landing and the next fix starting is short. The median sits around half an hour, most cycles begin within the hour, and the large majority start within a working day. Developers are not calmly returning to this work on their own schedule. The loop pulls them back.
So the tax is not only elapsed time. Every cycle breaks focus: write, get pulled back, fix, get re-reviewed, repeat. The calendar cost shows up on delivery dashboards. The attention cost shows up in the work that never happened while the loop ran.
The signal
The feedback is right. The timing is late.
Here is the nuance that makes the tax hard to remove with blunt policy: rework works. The share of review comments developers act on climbs steadily as rework deepens, from a few percent on clean PRs to a clear majority at the deep end. The pattern holds across every major finding category.
Correctness findings climb steepest. Logical bugs, breaking changes, and type inconsistencies all reach their highest acceptance in the deepest tiers, with security findings close behind. Those are the strongest candidates for earlier, blocking, pre-PR feedback. Deduplication rises too, but more slowly, so it works better as advisory guidance than as another loop.
The baseline
This is the industry’s resting state, not a bad quarter.
Week over week, the split barely moves. The share of PRs entering rework holds near half, and the share of review activity attached to them holds near four fifths, no matter how much volume swings. Most organizations sit in a majority-rework band, and scale does not help: the largest organizations carry the longest severe tails in the data.
Because the rate stays stable, the burden scales with volume. Growing teams do not grow out of rework; they just buy more of it. Repeat reviews become the majority of review activity well before most PRs are reworked, so even the moderate band already pays a structural tax.
Implications
Measure the loop, then move the feedback upstream.
The data does not argue for less review feedback. It argues for the same feedback, prioritized better and delivered earlier, so it stops creating repeated PR-phase loops. Five operating changes follow directly from these trends.
Measure the loop directly
PR counts, merge rates, and average cycle time can all look healthy while the loop runs. Track rework rate, repeat-review share, and cycle time by rework tier instead, and the tax becomes visible.
Treat size as a risk signal
Median change size climbs with every tier. A change that grows from tens of lines into the hundreds deserves decomposition, a draft review, or earlier feedback before the PR opens.
Route findings by acceptance
Correctness, compatibility, type safety, and security findings deserve blocking, pre-PR delivery. Maintainability suggestions get lower acceptance and work better as advisory guidance than as another loop.
Escalate deep rework
A PR entering its fourth cycle is an operational problem, not routine review. Split it, assign a domain owner, or settle the architecture question outside the loop.
Count attention, not just calendar
Most cycles start within the hour of feedback landing. That interruption cost is real, even when every merge-rate dashboard looks fine.
This is the model behind Baz. Planner reviews the plan before the code exists, and purpose-built agents give the PR a final contextual check, so review confirms the approach instead of correcting it.
Put your own numbers on the loop.
Benchmark your organization against these trends. Model the recovery with the rework calculator, or bring us your PR volume and we will measure your actual rework distribution and what moving feedback upstream would return.