Skip to content
All posts

Reducing noise: transient error suppression

Not every error in production is a bug. Some are weather. A flaky mobile network, a CDN blip, an aborted fetch because the user navigated away, a deploy-time chunk-load failure while browsers are still holding the old hash. If the pipeline treats each of these as a fix candidate, you end up with dozens of low-value PRs that collectively erode trust in the automation. This post explains the transient filter we added.

What counts as transient

We maintain a conservative pattern list for transient-looking messages and error types:

  • Connection-level failures: ECONNRESET, ETIMEDOUT, ENOTFOUND, EAI_AGAIN, "socket hang up".
  • Network idioms: "network error", "fetch failed", "request timeout", "timed out".
  • Client-side cancellation: AbortError, "aborted", "Load failed".
  • Deploy-time chunk failures: "chunk load", "loading chunk N failed".

If the error message or error type matches any of these, the error is treated as potentially transient. The list is intentionally short and only matches well-known mechanical failures — not semantic errors that happen to mention networking.

Suppression logic

Matching the pattern doesn't automatically suppress anything. Transient errors can still be real bugs: a misconfigured DNS entry, a consistently broken CDN edge. The suppression kicks in only when the occurrence count in a short window is low enough that the error looks episodic.

Specifically: if the error matches the transient pattern and has occurred fewer than five times in the current day, the pipeline marks it as transient and skips PR generation. The error still shows on the dashboard with a transient badge, so you can see it happened — it just doesn't wake the analyzer up.

If the count crosses the threshold, the suppression lifts. An ETIMEDOUT that fires a hundred times in an hour isn't weather — it's a real outage or a broken configuration, and the pipeline analyzes it normally.

Why the threshold is five

Five is a compromise. Lower (say two) catches too many real one-off bugs that happen to resemble transients. Higher (say twenty) delays the response to genuine network misconfigurations by hours. Five clears the bulk of noise while keeping reaction time tight for real issues.

The number isn't exposed as a per-app knob today. If you have a use case where five is clearly wrong for your context, reach out via contact.

Where suppression happens in the pipeline

Suppression runs after dedup and fingerprinting — so a transient's occurrence counter and dashboard visibility stay intact — but before any downstream work. The record transitions to "fix_generated" with a transient: true flag, and an activity log entry records that it was suppressed. No model call is made, no branch is created, no PR opens.

That ordering matters for two reasons. First, you can still see transient activity on the dashboard; the sparkline climbs, the occurrence count rises. Second, when a transient turns persistent, the transition is captured cleanly because the fingerprint was already tracked.

What suppression looks like on the dashboard

Errors marked transient show with a small badge and a muted status. The occurrence sparkline still updates, the last-seen time still refreshes, but the "Fix" button doesn't appear. If you want to force analysis on one — for instance, if you think it's a real bug misclassified as weather — you can manually trigger a fix from the error details view, which bypasses the filter.

Failure modes to watch for

The filter has two failure modes worth naming. The first is a false suppression: a real bug whose message happens to contain "timeout". The dashboard occurrence climbing without any PR opening is the visible signal; the fix is either to manually trigger or adjust the ignored-errors list. The second is a false non-suppression: a truly transient error that fires more than five times in a day for environmental reasons. The pipeline will analyze it and, typically, generate a low-confidence analysis that opens a ticket instead of a PR — so the damage is bounded.

Philosophy

The transient filter is the smallest sensible step in a larger direction: teaching the pipeline when not to do anything. A mature error pipeline spends a lot of its cycles on the question "does this deserve attention?" and the cheapest answer is "not yet." Suppression is how that answer gets encoded. It's not a replacement for good ignore patterns or a thoughtful deny list — it's a floor below which the pipeline refuses to escalate noise into work.