Snippet-based patching: applying AI fixes safely

The trickiest part of an automated fix pipeline isn't generating a plausible fix — it's applying the fix to the file without breaking anything. LLMs rarely reproduce multi-line code with byte-perfect whitespace, and a brittle substring match silently rejects real, correct suggestions because the model trimmed a blank line. This post walks through the patcher that replaced our naive source.includes(original) check and why it has the layers it does.

The starting point

Every model change comes as two strings: original (the code to replace) and fixed (the replacement). The patcher's job is to locate original in the real file and swap it for fixed. It must reject ambiguous matches (the snippet appears more than once) because patching the wrong occurrence would be worse than patching none.

Strategy one: exact match

We start with the simplest thing that could work: source.indexOf(original), counting occurrences to confirm uniqueness. When the model quotes the source verbatim, this strategy fires instantly and we're done. On well-behaved model output, this is most applications.

When it's ambiguous — the snippet appears more than once — we bail with a clear reason rather than falling back. Ambiguity usually means the model didn't include enough surrounding context to disambiguate, and silently guessing would be unsafe.

Strategy two: EOL and trailing-whitespace normalization

Files may have CRLF line endings, trailing spaces, or other purely-cosmetic whitespace the model normalizes away. Strategy two strips \r from both sides, collapses trailing whitespace per line, and retries the exact match.

If that hits a unique match, we apply the fix against the normalized source. This is the rare case where we modify the source's normalization as a side effect — but only to the extent of trailing whitespace, which is almost always fine in the repos we see.

Strategy three: line-trimmed matching

The more common model deviation is reflowed indentation: the code is the same, but the leading whitespace differs because the model "cleaned up" tabs or adjusted indent levels. Strategy three splits both sides into lines, trims each line, and looks for a contiguous run of source lines whose trimmed form matches every needle line in order.

Matches are rejected if they're not unique. When a unique match is found, we re-indent the fixed snippet to match the original block's leading indentation so the patched region parses correctly. Without the re-indent step, the patched file would have inconsistent indentation that linters and reviewers would flag.

Strategy four: noise-skipping line match

Because we send a compressed source view to the model (comment-only lines and blank runs removed), the model's quoted snippet can look contiguous in the compressed view but be interleaved with noise in the pristine file. Strategy four handles this: a run of source lines matches the needle if they match line-by-line while skipping blank and comment-only source lines between needle lines.

We classify source lines once per file — a single pass that marks each line as "real" or "noise" (including multi-line block comments) — and then slide the needle against the real lines. This is the strategy the diagnostic report labels trimmed_lines_noise_skipped. It's the last line of defense before we give up.

Guard rails

A few checks run before any strategy is tried. If either side contains an ellipsis marker ("// ...", "// rest of file", "// unchanged", ""), the patch is rejected with a specific reason. The model uses those as escape hatches when it doesn't want to quote the full snippet — and we refuse to guess at what the unquoted portion was. If the "original" is empty after trimming, we reject.

These guards exist because a single badly-applied patch is worse than a hundred rejected ones. The pipeline records every rejection with a reason so the diagnostic data is useful for tuning the prompt — if we see a rise in ellipsis rejections, that's a prompt problem to fix, not a patcher problem.

Why not use a full diff/AST patcher?

We experimented with AST-based patching early on. Two problems: it requires per-language infrastructure (Tree-sitter or equivalent) that bloats the deployment surface, and it forces the model to emit structured patches, which degraded suggestion quality materially. The four-strategy text patcher covers the practical cases cleanly with far less complexity.

A middle ground — parse both sides and align by token stream — is on the roadmap for a future iteration. The gains aren't large enough to justify the investment today.

Observability

Every patch attempt records which strategy succeeded or which reason it failed. That data lives on the PR record and the error record. When you see a PR with many patches attempted and few applied, the report tells you exactly where each failed. It's the fastest way to turn a "model keeps generating bad patches" hunch into a tunable signal.

Net behaviour

With the four-strategy patcher, the overwhelming majority of confident analyses produce applicable patches. The ones that still fail are mostly cases where the model needs to be told to quote more context — which means a prompt fix is the right answer, not a patcher fix. The patcher's job is to tolerate whitespace drift, not to invent fixes on the model's behalf.