Back to Blog
Essay · · 9 min read

Verifiable, Not Just Done

AI-led migrations are a genuine power tool. They sweep wide, mechanical changes across a codebase at a speed a solo developer can't touch. They also migrate the happy path and leave the rest to rot quietly, unless you design the migration to prove itself rather than just finish.


A while ago I needed to change the type of an identifier across an entire system: every data model that carried it, every repository that queried it, every service that passed it along, every handler that parsed it off a request. The old me would have looked at that and budgeted a grim couple of weeks. Not hard, just wide, the kind of change where the work is in the surface area and the danger is in the one place you forget.

It took a fraction of that. I fanned a set of agents across the codebase, one bounded slice each, and the mechanical transform rippled through layer after layer faster than I could have typed a single file of it. That is the real, unhyped power of AI-led migration: breadth. Repetitive, structurally identical changes spread across dozens of files and services are the shape of work a fleet is best at, and the shape that used to eat a solo developer alive.

And then I spent longer than the migration took finding the places it had quietly gotten wrong. That asymmetry is the whole subject of this post. The migration was done in an afternoon. Making it true took considerably longer, and every hour of it taught me the same lesson from a different angle. An agent will migrate the happy path with superhuman thoroughness and leave the shadow of it to rot, and the only defense is to design the migration so that “done” and “correct” are the same claim instead of two.

The shadow that doesn’t get migrated

The first thing I learned is that agents migrate what is load-bearing to the running program and neglect what is load-bearing to your confidence in it.

The production code converted cleanly. The test scaffolding did not. Mocks, fixtures, the little fake implementations that stand in for real ones, all of it was a parallel world the agents treated as scenery rather than substance, and a good chunk of it stayed quietly on the old type. What should have tripped an alarm and didn’t was this: a strict type checker ought to reject a fixture still built on the old type. Refusing it is the entire point of the type. The reason it didn’t is the quiet tell of the whole failure. Somewhere in the sweep, to keep everything compiling, the fixtures had been loosened. The precise type was softened to something permissive enough to swallow both the old shape and the new. The test layer went green by surrendering the exact checking it existed to perform. The rot was in the places that only ran when something was being verified, which is to say the places whose whole job is to tell you the truth later.

This is insidious because it inverts the normal signal. Usually passing tests mean the system works. After a half-migrated test layer, passing tests mean the tests no longer test what you think they test. You have not just left work undone. You have degraded the instrument you’d use to detect that the work is undone. The fix is unglamorous and non-negotiable: the test scaffolding is a first-class migration target, not cleanup. It moves in the same sweep, or the sweep isn’t finished.

Shape migrates; meaning doesn’t

The second lesson was worse, because it wasn’t about something left behind. It was about something carried forward faithfully that should never have been carried at all.

In one corner of the system, an identifier had quietly come to mean two different things in two different places. The same field name stood for two distinct concepts that a long-ago shortcut had let collapse together. A backfill had made the two values equal in practice, so nothing complained. The migration preserved that shape perfectly. The agents had no reason not to. The code said carry this field through, and they carried it through, with the same diligence they applied to everything else. What they could not do was notice that the thing being migrated was already wrong, that the faithful preservation of a conflation is just a conflation with a newer timestamp.

This is the boundary of what mechanical thoroughness buys you. An agent migrates structure. It does not, on its own, audit meaning, because meaning lives in intent and history and the half-remembered reason a field exists, context that is nowhere in the local code it’s transforming. Preserving shape is the job most of the time. In the dangerous fraction of the time, the shape was a lie, and preserving it laundered the lie into the new system looking freshly correct. Worse, a migration is often the exact event that unmasks such a conflation rather than merely carrying it. Once the two meanings ride a new type or a new path, they are free to drift apart, and the value that the old backfill quietly held equal can finally diverge. At that point the bug you preserved becomes the bug you shipped.

The call-path you forgot is where the invariant dies

The third lesson is the one I think about most, because it is the one that survives every test you’d normally write.

Some properties of a system are not local. An isolation guarantee, the promise that this tenant’s data cannot touch that tenant’s, is not a line of code. It is a promise that has to hold across every path that reaches the data. In the request path, the context that carries that promise was assembled correctly, and every test that exercised the request path passed, because on that path the guarantee held. But there was another path. An asynchronous worker rebuilt the same context from scratch, off the main line, and the rebuild dropped a single field: the one that decided which tenant’s store the work was routed to.

So the guarantee held everywhere I looked and broke on the one path I didn’t, silently, on the production seam where there was no request to test and no obvious place for the assertion to live. The agent that built the worker did its job and reconstructed the object. It simply could not know that one of the fields it left at its default was load-bearing for an invariant that is written down in no single file, because cross-cutting promises are the things that live between the files. Miss the concern in one call-path and the whole guarantee is gone, and the system will not tell you, because from the inside a dropped invariant looks identical to one that was never threatened.

What caught it was not a test. It was a deliberate audit that distrusted the green, walking path by path through everything that could reach the data, not just the request path every existing test already exercised, until the worker seam surfaced. The fix was to assert the isolation invariant at runtime on the worker’s own rebuilt context, so the same break can never again pass quietly. And I’ll own the uncomfortable half of this. The migration did not create that gap; it exposed one I already had. There was no isolation test on the async path before any agent went near it, and the dropped field merely industrialized a hole my slower, narrower process had been lucky enough to keep stepping around. None of these three failures are new. They are old failure modes, run at a new speed.

Make the migration prove itself

The thread through all three failures is the same. Each one passed the check that looked like verification and failed the verification that mattered. The defense, then, is not “be more careful.” It is to design migrations whose definition of done is a property you can assert, not a diff you can eyeball.

A few things became rules for me after this.

Additive before destructive. Add the new column, the new path, the new type alongside the old, migrate onto it, then remove the old thing in a separate, later step guarded on a real check. Never rename-and-pray in a single sweep. The window where both exist is where you get to verify before you can no longer go back.

Assert the invariant at runtime, not in the text. It is not enough to grep the diff and confirm the policy string is present. Write the test that inserts data as one tenant and proves another tenant’s query returns nothing. Exercise the promise itself, on the real store, and on the async path through the worker’s own rebuilt context, not a second trip down the request path that already passes. String-matching a migration confirms it was typed. Running the invariant confirms it is true.

Treat the shadow as load-bearing. Tests, fixtures, mocks, seed data, the demo: everything whose job is to stand in for or describe the real system migrates in the same motion, because those are the surfaces that lie most convincingly when half-converted.

Keep the timing calls human. The decision of when to land a breaking change, doing it now while nothing downstream depends on the old shape rather than later, is judgment, not mechanism. An agent will execute a breaking migration whenever asked. Knowing that the cheapest possible moment to break compatibility is before anyone relies on it is the kind of call you make by understanding the blast radius, and it stays yours.

The power tool that can’t tell you it cut wrong

I am not arguing against AI-led migration. I do it constantly, and the breadth is genuinely transformative, one of the clearest wins in the whole shift.

But a power tool cuts fast and straight and doesn’t get bored, and it also cannot tell you that you fed it the wrong board. What the speed does not do is relocate the verification. If anything it raises the stakes, because now the wrong thing arrives across forty files in an afternoon instead of one file at a time over two weeks, with all the early-warning friction of the slow way stripped out.

The synthesis I keep landing on is the one that almost rescues the whole problem. The same fleet that outran my verification is also the fastest tool I have for building it. The cross-tenant assertion, the fixture migration, the property test that pins the invariant are themselves bounded, mechanical, fan-out-able work, the shape an agent does well. The discipline is not resisting the speed. It is spending a deliberate share of it on the checks instead of pouring all of it into the change. The migration being done was never the question. The migration being true is the whole job, and “true” is something you have to make checkable, because the tool will cheerfully tell you “done” either way.