Skip to content
All posts

Handling false-positive fixes

Every automation produces wrong output occasionally. The thing that determines whether an automated fix pipeline is livable in the long run isn't avoiding false positives — it's making them cheap to reject. This post describes the false-positive recovery paths nreactive is designed around and the practices that keep them painless.

What "false positive" means here

False positives split into three kinds. A misidentified root cause: the model reasoned correctly about the stack but picked the wrong underlying explanation. A correct root cause with a wrong fix: the diagnosis is right but the change misses the intent of the surrounding code. A correct diagnosis and fix applied to the wrong location: the patcher matched a similar snippet elsewhere. Each kind has a different recovery path.

Making rejection cheap

The first-order question isn't "how do we prevent false positives?" — that's impossible in the general case — but "how cheap is it to close one?" The PR body is structured so a reviewer can reject quickly:

  • The root cause explanation is one paragraph, so disagreeing with the diagnosis takes ten seconds of reading.
  • The fix description is one paragraph, so disagreeing with the change takes another ten seconds.
  • The confidence score is visible up top, so a 0.61 PR telegraphs "treat this skeptically" before you even read the diff.
  • The before/after snippets are verbatim, so there's no ambiguity about what's changing.

A reviewer should be able to close a false positive in under a minute. If your team finds it takes longer, the PR format has failed and that's worth reporting via contact.

What happens after rejection

Closing a generated PR doesn't poison the error record permanently. The pipeline remembers that a fix was attempted and that the PR was closed without merging; when the same error fingerprint fires again, the next analysis will run fresh. In practice that often produces a better fix — the model's second attempt, from a clean slate, regularly avoids the issue of the first.

If you want to block re-analysis entirely — the error is genuinely not a bug, just noise — the ignore-error patterns list is the right tool. Add the matching pattern and future occurrences stop before they enter the pipeline.

When false positives cluster

Occasional false positives are fine. Repeated false positives in a pattern are signal. A few things to check when you see a cluster:

  • Is the confidence threshold too low for your codebase? Raising it from 0.6 to 0.75 eliminates the low-confidence tail, which is where clusters usually live.
  • Is a specific file family producing consistent misfires? Deny-list that family.
  • Is a specific error class producing misfires regardless of file? Ignore-pattern the error class.
  • Is the cluster tied to a specific model? Switching the app's AI model from the default to a reasoning-focused option sometimes resolves it for subtle-logic bugs.

Each of those is a tunable. The activity log and PR dashboard give you the data to make the call.

Wrong root cause

When the diagnosis itself is wrong, the fix is usually wrong by construction. Close the PR with a short comment on the real cause; it's useful data for anyone investigating the error later. Don't spend time patching a bad PR into a good one — the pipeline will try again on the next occurrence.

If the wrong-root-cause pattern repeats for a specific error, it often means the analyzer's context is insufficient. Check whether the files referenced by the stack actually cover the true cause, or whether a deep async chain is hiding the real frame beyond the eight-frame cap we walk. If the latter, report the example via contact — widening the cap selectively is a tunable.

Correct cause, wrong fix

This is the most common false-positive shape. The model diagnoses correctly but proposes a change that misses a subtle constraint of the surrounding code. A few common sub-patterns:

  • The fix adds a guard where the guard should actually be earlier in the call chain.
  • The fix handles the error by swallowing it rather than propagating, which changes the API contract.
  • The fix accidentally restructures a function in a way the rest of the codebase doesn't match.

For these, reject the PR and sometimes hand-write the fix with the correct constraint in place. The reviewer's read of the PR already did most of the diagnosis work; writing the correct fix is often a two-line edit.

Wrong location

The patcher rejects ambiguous matches, so wrong-location bugs are rare. When they do happen, it's because the before-snippet the model quoted is genuinely unique but the fix should have been applied elsewhere in the file. The PR will look correct in diff but the reviewer spots that the edit is surgically accurate and semantically misplaced. Close and let the pipeline retry.

The long-run perspective

A false positive feels expensive in the moment — a reviewer spent two minutes on a PR that didn't land. But over the scale of a team-month, the false-positive rate matters far less than the true-positive rate: how many real bugs did the pipeline catch and fix with low reviewer friction? If that number is healthy, a few false positives are a tolerable cost. If it isn't, no amount of false-positive tuning compensates.

Focus on the merge rate, not the reject rate. That's the health metric that matters.