Writing deterministic tests alongside fixes

A patch without a test is an assertion. A patch with a failing-before / passing-after test is a proof. nreactive tries to attach that proof to every generated PR, and this post explains how — and, more importantly, when we refuse to.

The test contract

Every generated analysis includes an optional testFile field. When populated, it contains a path, a framework hint (Jest, Vitest, Mocha, Pytest, etc.), and the full contents of the file. The contract is simple: the test must fail against the pre-fix source and pass against the post-fix source. That's it. The model is free to choose the shape of the test as long as it satisfies the contract.

The PR body calls the test out explicitly when it's present, with the framework and the path. Reviewers can run it locally against the pre-fix code to confirm the failure mode, then against the post-fix code to confirm the fix.

Framework inference

The model infers the test framework from what it sees in the provided source. If it spots a vitest.config reference or vitest imports, it uses Vitest idioms. If it sees jest.config or describe/it/expect, Jest. If the repo is Python-flavoured, Pytest. The inference is shallow by design — we don't want the model guessing at complex runner configurations. If the signal is ambiguous, it skips the test rather than writing one that doesn't fit.

File placement

Placement follows the convention the model sees. If existing tests sit next to sources (foo.test.ts next to foo.ts), new tests go there. If there's a dedicated tests directory (tests/, __tests__/, spec/), the new file goes inside. If neither convention is visible, the test gets placed next to the source as a safe default.

The deny list still applies — if the inferred location is deny-listed, the test is dropped. Better no test than a test in a place you've explicitly said not to write to.

When we don't ship a test

The system prompt explicitly allows the model to skip the test. We'd rather ship a PR with a clear fix and no test than a PR with a shaky test written to meet a quota. Common cases where a test gets skipped:

The fix is inside framework-owned code where writing a meaningful unit test would require a full integration harness.
The bug depends on runtime state (database rows, external APIs) that isn't present in the provided source.
The root cause is a pure refactor-level change — renaming a missing identifier, for instance — where a test would effectively assert the rename happened.

In each of those cases the PR body notes that no test was added and the reviewer knows to manually verify.

Deterministic by default

We try hard to keep generated tests deterministic. That means no reliance on the current date or time, no network calls, no sleeps, no flaky ordering assertions. If the bug is genuinely time-dependent, the test either freezes time with the framework's clock mocking primitives or the test is skipped. A flaky test attached to an AI-generated fix is a permanently poisoned signal — every future failure looks like an AI problem, even when it isn't.

Integration with CI

Tests added to a PR run in whatever CI your repo has configured. If CI fails, the PR doesn't auto-merge even if confidence is high. This is the integration point that ties the quality of the fix to your existing safety nets. A confident fix with a failing test is still a failing PR — the confidence score doesn't override green checks, it supplements them.

Reviewing a generated test

A few things to check before merging:

Does the test actually exercise the root cause, or does it assert an implementation detail?
Does it reference the right file paths and module names? LLMs occasionally invent symbol names that don't exist.
Does it clean up after itself (mocks unset, timers restored)?
Does it fit stylistically with the rest of the test suite?

If any of those fails, reject the test but keep the fix — or tweak the test and re-request review. The test is a nice-to-have companion to the fix, not a hard prerequisite.

Net value

Over time, generated tests end up contributing measurably to coverage. They tend to cluster around exception paths and edge cases, which are the spots hand-written tests frequently miss because developers think about the happy path first. The tests aren't a replacement for a human test plan, but they do plug real holes. Treat them as a nice side effect of the fix pipeline rather than the reason you run it.