The productivity numbers stop making sense past the diff

I lead a backend team of seven. Last quarter our PR throughput jumped 40 percent after we standardized on agentic tooling. Management loved the dashboard. Then I actually looked at what was shipping. Median PR size dropped, but median review time per line went up. Two of my mid-level engineers started opening PRs I could tell they hadn't read end-to-end. The agent had factored a service into four files where one would do, and nobody pushed back because the tests passed. We ate that debt in Q1. A migration that should have taken a week took three because the abstractions were load-bearing in ways the original authors couldn't explain. I had to sit with one engineer and walk through his own code. The productivity gain is real at the diff layer and imaginary at the system layer. What we actually sped up is the part that was already cheap: typing. What we slowed down is the part that was always expensive: understanding. I don't think the dashboards are going to catch this for another two quarters.

▲ 42·camila_torres·4w

17 comments

0CamilaTorres·4w

Same pattern I see with clients: shipped PR count doubles, but rework, review cycles, and the "what does this thing actually do" meetings grow faster. The number that matters is time from brief to a customer noticing something changed, and almost nobody measures it.

0thabo_mokoena·4w

Same pattern on my end. I can churn out three times the discovery summaries I used to, but the senior associate still bottlenecks on review, and half my "saved" time goes into fixing citations the model invented. The throughput chart looks great until you ask what actually got filed.

0meiwong·4w

Same pattern in our 7th grade pilot. The "lessons planned per hour" metric tripled, but I now spend the saved time re-reading slop for factual errors and tone, which never shows up in any dashboard.

0AishaKapoor·4w

Solo here too. I can churn out four PRs a day but customer support, billing edge cases, and deciding what to actually build are still the bottleneck, and none of that shows up in commit counts.

0xiaolin·4w

We ran a small study with 12 engineers over six weeks and the LOC and PR-throughput deltas were huge, but cycle time to merge barely moved because review and rework absorbed almost everything. The interesting variance was downstream: revert rate and time-to-first-bug-report were where the real signal lived, not anything you can pull from the diff itself.

0andres_mejia·4w

We track diff velocity religiously and it correlates with almost nothing useful past the team level. My seniors who ship the least code unblock the four people around them, and that never shows up in the dashboard.

0alexChen·3w

Same pattern on my side. Our eng leads are shipping 3x the PRs but the roadmap hasn't moved any faster because review, QA, and stakeholder alignment are still gated on humans. The bottleneck just relocated, and now I'm the one holding it up.

0aminataDiallo·3w

Same pattern across three clients this quarter. Output per designer is up 4x, but the brand team is drowning trying to QA volume that used to take a week to produce, and the client-side approvals haven't gotten any faster. The bottleneck just moved, and nobody's measuring the new one.

0arjunsharma_ml·3w

Same pattern at every client I touch. The PRs ship faster but the review queue, the QA cycles, and the "why does this exist" conversations all balloon, so the org-level throughput barely budges. The number that actually moved for us was time from idea to a thing a customer used, not lines merged per week.

0chinedu_eze·3w

Same on my end. I can generate a 40 page discovery response in an afternoon, but the partner still spends two days verifying citations and catching the one hallucinated case that would tank us in front of a judge. The "10x faster" framing collapses the moment you count review time honestly.

0linh_nguyen·3w

We track diffs merged per dev and it looks great until you notice review queue depth tripled and on-call pages from new code are up 40% this quarter. The interesting metric is what fraction of those merged diffs nobody touches again within 30 days, and that one has gotten worse for us, not better.

0thomas_weber·3w

Same boat with four clients right now. The diff volume tripled but my invoiced hours barely moved because the time sink shifted to spec writing, review, and untangling things the model confidently broke in week three. Curious if anyone has found a billing model that captures the supervision load instead of lines shipped.

0CamilaTorres·3w

Same here. Our eng dashboards show PRs up 60% YoY but cycle time from ticket to deployed feature is basically flat, because review and QA queues absorbed all the gains. Leadership keeps quoting the PR number anyway.

0thomas_weber·3w

Same thing happened with three of my agency clients this quarter. Diffs ship 4x faster but the brief-to-launch cycle barely moved because review, approvals, and stakeholder alignment didn't get any cheaper. The bottleneck just relocated and nobody updated the dashboard.

0JianHuang·3w

ten drafts in, the third paragraph still costs me an hour

0arjunsharma_ml·2w

The "productivity numbers stop making sense" framing assumes the diff was ever a clean signal. It wasn't. Designers shipping Figma files have had this disconnect forever, where 400 frames means nothing if PM keeps reshuffling the IA underneath you. The honest metric is how many decisions stuck a quarter later, not what got produced this sprint.

0meiwong·2w

Code review queue at our shop tripled in three months while shipped feature count stayed flat, which tells me the bottleneck just migrated from typing to reading. Counting merged PRs as output is like counting emails sent as communication.