I recently wrote about how I work with AI coding agents and about code review in AI-augmented development. I meant every word of both. But parts of them are already not quite where my thinking is now.

This is not a retraction. The ground keeps moving under our feet. The only irresponsible position right now is certainty. We have to be open to changing our minds as the AI models and harnesses improve, and as we discover how best to work with this technology.

The four steps

Dan Shapiro recently wrote about what StrongDM's CTO Justin McCarthy learned building a software factory. The progression is simple:

  1. Recognise you're not the best person to write the code any more. The AI writes the code.
  2. Accept that if you're not writing the code, but you're still reading every line, you are the bottleneck. Stop reading the code too.
  3. Recognise that this creates an enormous pile of terrifying problems.
  4. Realise that solving those problems is now your actual job.

I think this describes the trajectory we're on. Shapiro describes a destination. What I'm trying to describe is being mid-journey, somewhere on this path. But exaclty where I am depends entirely on context.

Where I actually am

Side projects: I'm experimenting freely. Steps 1 and 2 feel natural. I let the AI generate, I don't read every line, and I'm building verification instead. I'm focusing on carefully reviewing the plans, and developing AI assisted code review. The cost of failure is low. The learning is high.

At work: I'm closer to traditional review. SOC 2, ISO 27001, compliance requirements mean I need evidence that a human understood what shipped. "An AI agent healed it" is not an answer our compliance team can work with yet. Nor should it be. I'm thinking about how AI can help scale this, but I'm working in a team, and so other factors need to be taken into account.

I can see the destination Shapiro describes. I'm not fully there yet. And that's fine. The interesting question isn't "have you arrived?" but "what has to be true before you can move further along the path?"

Why letting go is less scary than it sounds

Human code review was never very good at finding bugs. The empirical evidence backs this up.

What's more interesting is what code review actually delivered as side effects: shared understanding of the codebase, consistency across the team, accountability for what shipped, knowledge transfer between engineers. Those are real and valuable.

But they're not what most engineers think they're defending when they resist the idea of not reading every line. When you realise you're grieving familiarity and shared understanding rather than bug-catching capability, it reframes the problem. Those are solvable problems. They just have different solutions than line-by-line review.

From reviewer to feedback loop designer

If you're not writing the code, and you're not reading every line, what is your job?

Not: "Did the AI write good code?"

But: "Have I built an environment where bad code can't survive?"

This is closer to SRE thinking than traditional code review. You're designing systems that keep AI-generated output on track: verification pipelines, observability, feedback loops, automated gates. The discipline doesn't disappear when you let go of reading every line. It moves. From inspecting output to designing the systems that inspect output for you.

I wrote about mechanical sympathy recently, the idea that every generation of engineers needs to understand the layer beneath their abstraction. The same principle applies here. You need to understand how AI-generated code fails (quietly, confidently, locally-coherent-but-globally-inconsistent) to design feedback loops that catch those specific failure modes.

Verifiable over deterministic

My earlier thinking drew a hard line: use deterministic tools (linters, type checkers, compilers, tests) for everything you can, and only use AI for the rest. I still believe that. But it's incomplete. The real requirement isn't determinism. It's verifiability.

There's a spectrum:

Best: verifiable and deterministic. Linters, type systems, compilers, test suites. Same input, same output. You can prove correctness. This is the gold standard and you should push as much as possible into this category.

Useful: verifiable but non-deterministic. AI code review that flags concerns with evidence. Human review. Property-based testing with AI-generated cases. The process isn't repeatable, but you can assess whether the output is right. You can show your working.

Dangerous: unverifiable and non-deterministic. Trusting AI output with no mechanism to assess correctness. No tests, no review, no evidence trail. This is where things go wrong, and it's where most "vibe coding" sits when done carelessly.

The question isn't "is this check deterministic?" It's "can I verify the result, and can I show evidence of that verification?"

This is also where compliance frameworks might eventually meet AI-augmented workflows. The intent of SOC 2 and ISO 27001 isn't "a human read every line." It's "you can demonstrate control and correctness." Auditable, evidenced verification could satisfy that intent even as the mechanism shifts. Not today, necessarily. But that's the direction.

What needs to be true

Before organisations can move further along Shapiro's four steps, several things need to happen.

Verification tooling needs to mature. Not just linters and tests, but AI-assisted review that produces auditable evidence. We need tools that don't just say "this looks fine" but show why, with traces that an auditor could follow.

Compliance frameworks need to catch up. Or at least be interpreted in ways that recognise systematic verification as a valid control. The current assumption in most audit frameworks is that a human reviewed the change. That assumption will need to evolve, but it won't evolve until the alternative demonstrably works.

The specification layer needs proper tooling. If intent documents and specs become the durable artefact (and I think they will), they need consistency checking, dead requirement detection, contradiction detection. Right now, a repo full of markdown specs is just files. No compiler tells you when two specs contradict each other. No linter catches a requirement that's been superseded but never removed.

Teams need new ways to maintain shared understanding. Code review served a knowledge-sharing function that had nothing to do with finding bugs. If that goes away, something else needs to replace it. AI-generated explanations of what changed and why, targeted at humans rather than machines, might serve that purpose. But the tooling isn't there yet.

Trust needs to be built incrementally. Side projects first. Low-stakes features. Gradually expanding the boundary as confidence in verification systems grows. This is how every new practice earns legitimacy in engineering organisations, and AI-augmented workflows shouldn't be an exception.

This post has a shelf life too

My previous posts described how I work and how I think about code review. This one describes how both of those are shifting and why.

I expect to write another one when the ground moves again. It will.

That's not a failure of thinking. It's the appropriate response to a situation that is genuinely shifting under us. The only irresponsible position right now is certainty.

The discipline is the same as it's always been in engineering: understand the layer beneath the one you're working at. The layer has changed. The discipline hasn't.

If you're on this path too, wherever you are on it, I'd love to hear where you've landed. Drop me a line.