Skip to content
All posts

How nreactive scans your repo for bugs

The scheduled scan inside nreactive is a small, boring pipeline — and that's the point. It has to be predictable, cheap to run, and quiet when there's nothing meaningful to report. This post walks through what happens when a scan fires, in the order it happens, so you can reason about what ends up in the model's context and what doesn't.

Step one: listing source files

A scan starts by pulling the file tree at the default branch. We walk the tree recursively, keep only the source extensions we care about (TypeScript, JavaScript, Python, Go, Rust, Ruby, Java, C#, PHP), and drop anything in node_modules, .next, dist, build, out, coverage, vendor, or files matching .min., .test., .spec., or .d.ts. Files smaller than 400 bytes are treated as re-export barrels and skipped; files above 60 KB are dropped because a single huge generated client would starve the rest of the budget.

Everything else gets a priority weight: API routes and framework entrypoints sit at the top, middleware and server files right behind, services and utilities in the middle, UI components toward the bottom. Depth and raw size apply small penalties. The top forty paths after weighting are what the scan actually considers.

Step two: budgeting

Forty paths is still too many for a single model call. We cap the per-run budget: fifteen files for free-plan scans, twenty-five for Pro. The prioritized list feeds that budget in order, which means the scan ends up reading the files most likely to matter first — and if the budget runs out, the low-priority tail is simply skipped for that run.

On long-running repositories we add one more signal: which files actually changed in the last thirty days. Those are moved to the front of the queue regardless of their base weight, because new bugs overwhelmingly live in new code. A component that hasn't been touched in two years is probably fine; the route added last week deserves a fresh set of eyes.

Step three: compression

Once the file contents are fetched, we pass them through a comment and blank-line compressor before handing them to the model. Comment-only lines and runs of blank lines are removed; code whitespace is preserved. On typical JavaScript and TypeScript projects with JSDoc this saves 20% to 35% tokens, which is meaningful across a multi-file scan.

The original files are kept — we only send the compressed view to the model. When the model quotes a snippet back and we need to patch the file, we match against the pristine source with a multi-strategy locator that tolerates the whitespace drift the compression introduced.

Step four: the review prompt

The system prompt frames the model as a senior reviewer producing only high-signal improvements a thoughtful colleague would approve on first read. We explicitly disallow stylistic nits, rename-only refactors, "add comments" suggestions, and any behaviour-preserving change. The categories we want, ordered by priority, are: real bugs, security or correctness issues, performance issues, and type-safety gaps. Everything is constrained to a JSON schema so parsing is deterministic.

The output is capped at five suggestions per run and ordered by severity. If nothing clears the bar, the model returns an empty list — and the pipeline logs "no suggestions found" and moves on. It is explicitly allowed to produce nothing.

Step five: patch and PR

Every suggestion comes with a verbatim before/after snippet. Those land in our snippet patcher, which tries exact match first and falls back through progressively looser strategies (EOL normalization, per-line trim, noise-skipping trim) while rejecting non-unique matches to avoid patching the wrong occurrence. Suggestions that apply cleanly are committed to a new branch; a single PR bundles them with a structured description.

The overall shape is deliberately unambitious: small surface, small prompt, small output. If you want to scan your own repo, the docs cover the setup and pricing lays out the difference between the free and Pro scan budgets.

A note on what the scan is not

The scan is not a full-repository analysis. It's not an embedding index, not a call-graph walk, not a static analyser. It's a short, predictable pipeline that reads a bounded set of high-priority files, compresses them, and asks a capable model to produce a small number of high-signal suggestions. That shape is what keeps the cost, the latency, and the noise profile all under control at once. Complexity can always be added later; what's hard to recover is the trust teams build when the automation consistently refuses to produce junk.