Differ

One thing I’ve noticed as I’ve been using AI more and more to build projects and features: there’s often just a lot of code. And not only source code, but tests, specs, docs, diagrams, etc. It can make reviewing PRs…intimidating. A couple examples.

Blackbird

When adding a recent feature that allows for parent tasks to serve as a review for all child tasks in blackbird (PR here), I had to iterate quite a bit. It turned into a pretty large changeset, encompasing 80+ files and ~15k LOC 😅 Now, I’m treating blackbird as an experiment in letting AI do all of the actual implementation. Maybe 80% of blackbird has been built directly from specs -> plans -> blackbird implementation. The other 20% has been me working in Codex to fix bugs, add small features that don’t call for a full spec, etc. But I still like to at least review the code. And man, it’s not easy to approach a PR like that. I admittedly ended up giving it only a cursory look, but one thing I was thinking about was the breakdown of the changes. I know from having worked with blackbird for a bit now that I usually end up with a massive amount of tests, for better or worse. There are also changes in docs, specs, and my AGENT_LOG.md, which my AGENTS.md instructs Codex to append to as it works. So of those ~15k LOC, how many did I actually need to review? I’m still not sure but I would really like to have known going into that.

$DAY_JOB

A similar situation came up the other day. One of our engineers opened a PR that was several thousand lines. All of the responsible engineers reading this will be glad to hear that in the workplace we’re more serious about reviewing code than I am with blackbird. But that means that a several thousand line PR kind of just sucks for anyone who needs to review it. But! The engineer who opened it then posted in Slack something to the effect of, “Don’t worry, it’s only ~500 lines of actual code changes.” It turns out that the artifacts from OpenSpec (spec, design, plan, etc) were just really large. Since those are treated as part of the code base, they go in the PR. But they’re generally not something that reviewers need to look at it – it can be helpful if you want to compare the implementation to the spec, but generally you can review the implementation independently.

Differ

In both of these scenarios it would have been nice to know the breakdown of these changes up front. How many LOC were changed in total? Tests? Source? Docs? Other? It’s just nice info to have when thinking about starting a PR review. So, I built differ. It does just that:

> differ
Documentation: + 359 -  0 ( 359) [2 files]
Tests:         + 324 -  9 ( 333) [6 files]
Source:        + 642 -213 ( 855) [15 files]
Total:         +1325 -222 (1547) [23 files]

We can see that, of the ~1,300 LOC changed, about half are tests and docs, which don’t need to be reviewed quite as thoroughly. In my mind, this can make approaching a review easier.

differ also can output JSON so it can be integrated into CI and you post a detailed breakdown as a comment in Github. You can see it here at work in the differ repo.