Jun 18, 20266 min read/2026/06/18/automating-legacy-migration-with-llms/

Automating a Legacy Migration with LLMs: The Model Was Never the Hard Part

If you have ever stared at a half-million-line legacy application and wondered whether an LLM could
just migrate it for you, this post is for you. The short version: yes, it can do a lot — but the
model turns out to be the easy part. The hard part is everything around the model. After spending real
time pointing language models at a large legacy codebase (think an older .NET Framework / WebForms app
heading toward modern .NET / Blazor), here are the lessons that actually mattered. None of them are
about prompt engineering.

The model is not the bottleneck — the plumbing is

The intuition everyone starts with is "a smarter model will migrate better code." In practice, a
mid-tier model already writes perfectly reasonable migrated code most of the time. Where the time
goes is the machinery that feeds the model context, applies its output, decides whether the result is
correct, and recovers from failure. You will spend 80% of your effort there. Plan for it.

The diff trap

The obvious design is: ask the model for a unified diff, then git apply it. It feels clean and
auditable. It is also where most of your failures will come from.

Models are good at writing code. They are surprisingly bad at writing a diff that applies — the
line numbers drift, the context lines get subtly reformatted, and every so often the model "helpfully"
edits a second file that the patch tooling can't find. On one batch I watched a 75% apply-failure rate
that had nothing to do with the quality of the migration — the code was fine, the diff just wouldn't
land.

Two things help enormously:

  • Constrain the patch to a single file. If the task is "fix this one file," strip any hunk that
    touches anything else before applying. Multi-file over-reach is a top failure mode.
  • Apply in tiers, not all-or-nothing. Try a clean apply, then a 3-way merge, then a fuzzy apply
    with relaxed context. A patch that fails strict matching often lands fine with a little slack.

But the deeper lesson is this: the entire class of "diff won't apply" problems is an artifact of the
diff-based design
. Tools that let the model edit the file directly (the agentic approach) make
this problem simply disappear. More on that at the end.

When the model says "I can't," it's doing you a favor

My favorite failure wasn't a failure. The model kept returning prose instead of code — refusing to
produce a patch. It would have been easy to treat that as a malfunction and crank up the retries.

When I actually read the responses, the model was blocking on purpose: it didn't have the source of
a helper method it needed to call, and it correctly refused to hallucinate a migration it couldn't
verify. It even named exactly which types and methods it was missing.

That reframes the fix. The answer isn't "force it to output something" — it's "go fetch the context it
asked for." A migration agent that can't locate the method it depends on can't work. So teach the
harness to resolve those symbols (a quick repo search for the named definitions) and feed them back.
Suddenly the "broken" model produces a clean, building patch. The refusal was good engineering on the
model's part; the bug was in my pipeline starving it of context.

Know when to stop retrying

Repair loops are powerful — feed the build errors back, let the model try again, iterate. But they
have a failure mode that wastes hours: the model rewrites the same file over and over while the error
count sits flat.

A flat error count across two rounds is a signal. It means the fix is not in the code — it's at a
different layer. In one case the model kept writing Microsoft.Data.SqlClient while the project
referenced System.Data.SqlClient. No amount of rewriting the file would ever fix that; it needed a
project-level change (add or pin a package). The model literally cannot edit the csproj from inside
the source file.

So bake a stop-rule into your loop: if retrying stops moving the needle, the problem is at a higher
tier — stop rewriting and fix the project.
Matching the fix to the right layer is half the battle.

The framework wall is real, and it's a category error to ignore it

This one is specific to "uplift" migrations (old framework → new framework), and it's brutal if you
miss it. A migration guide will happily tell the model to use modern APIs — say, the dependency-
injection or interop primitives of the target framework. But if you are migrating in place, inside
a module that still targets the old framework, those APIs don't exist yet. The model writes
beautiful, idiomatic target-framework code… that cannot compile where it lives.

The fix is to recognize the tier you're working in and decouple instead of port: extract a small,
framework-neutral seam (an interface the module owns) and leave the actual target-framework
implementation for later, when the host project is on the new framework. The model's job in the old
module is to remove the coupling, not to write code the old framework can't host. Knowing this up
front saves you from a pile of "confident, approved, uncompilable" output.

Measure the right thing, and go leaf-first

Two operational notes that paid off:

  • Port bottom-up. Migrate the leaf dependencies first so every layer above compiles against
    already-migrated code. Trying to migrate a high-level component while its foundations are still on
    the old stack is just thrashing.
  • Pick a metric that actually moves. In a big interconnected module, the total error count
    barely budges per file (everything cross-references everything) and only collapses near the end.
    That's demoralizing and misleading. Track a subset you can move — e.g., "files still referencing
    the old framework" — and gate the full build at the end.

Structured pipeline vs. agentic tool — and why I landed on a hybrid

There are two philosophies for driving the model:

  • A structured pipeline: one prompt in, one diff out, your code applies/builds/scores it. This is
    fantastic for measuring — you can score each attempt against a known-good result, build a
    regression suite, and actually tell whether a model or a prompt change made things better.
  • An agentic tool: give the model real tools (read, search, edit, build) and let it drive a
    multi-turn loop. This natively solves context-gathering, multi-file edits, and the whole diff-apply
    problem — because there's no diff, the agent just edits the file.

For throughput — getting the migration done — the agentic approach wins, precisely because most of
the pipeline cleverness above exists to compensate for the diff design's weaknesses. For measurement
— knowing whether your system is improving — the structured pipeline is irreplaceable.

So don't choose. Wrap the agentic tool inside the measurable shell. Let the agent do the edit;
let your harness build-gate and score the result against your known-good criteria. You get agentic
reliability and a number you can trust.

Takeaway

If you're about to point an LLM at a legacy migration, internalize this: the model will mostly write
fine code. Your real engineering is the harness — feeding it the context it asks for, applying its
work without fighting diffs, knowing when retrying is pointless, respecting the framework boundaries,
and measuring progress with a metric that actually moves. Get those right and "can an LLM migrate my
legacy app?" stops being a gamble and becomes a pipeline you can run, measure, and trust.