Software Engineering
The Truth About Developer Productivity in the AI Age
- Activity is not connected to system-level outcomes.
- Goodhart’s law — when a measure becomes a target, it ceases to be a good measure
- Activity metric — measures behavior, easy to measure, low value
- Output metric — measures deliverables, somewhat easy to measure, some value
- Outcome metric — measures system changes, hard to measure, high value
- Activity and output metrics aren’t tied to value stream outcomes, offer limited data, and are easily gamed.
- In the 2010s Nicole Forsgren DevOps Report was a shift. We’re now looking at deployment frequency, deployment lead time, rework rate, change failure rate, time to restore. This turned into DORA, and then Accelerate was published.
- GenAI comes with privacy risks, vulnerabilities, environmental costs, and market concentration within only a few tech companies. It is transforming the globe and has accelerated development, which means companies are going back to activity and output metrics.
- DX’s AI-assisted engineering: Q4 impact report…
- Good: Has a large sample, includes different companies, explains methodologies
- Issues: Unsure if data is randomly sampled, self-selected, stratified, or weight adjusted. This means we don’t know if this is representative of the industry.
- Tool interaction… Does not equate to strategic integration of AI.
- Time saved per week with AI tools… Activity metric, and self-reported (relies on memory, uses counterfactual reasoning, and has subjective attribution). Easily gamed.
- % code AI-authored… Activity metric, hard to measure accurately, definition was subjective, self-reported.
- PR throughput… Output metric. The report claims a correlation between frequency of AI usage and PR throughput. There’s no mention if it’s a statistically significant correlation. PR throughput is not a measure of delivery or productivity (e.g., 100 PRs a month but only one deployment because of manual tests; trunk-based development = 0 PRs but many deployments per day). You can’t prove it leads to better business results. This one metric tells you nothing about deployments or service reliability.
- Now what?
- Recognize that AI-driven hyper-measurement of productivity is Taylorism all over again.
- The team is the unit of delivery, not an individual.
- Do your own critical thinking.
- Tool-accelerated activity != real business value
- Read Accelerate and Modern Software Engineering and implement them in your org.
- Deployment / service reliability / technical quality numbers improving? Good technology outcomes!
- Time to value / total cost of ownership across all teams improving? Good!
- Business outcomes that matter most to your org improving? Good!
- Tech capabilities are leading indicators of success. Look for more TDD, more loosely coupled architectures, and more stable CI. Add AI activity metrics and then see if there’s a correlation between AI usage and capability improvement.