Skip to content

Flaky Test Detection

Flaky tests — tests that pass and fail without code changes — erode trust in your test suite. The Intelligence dashboard identifies and classifies them automatically.

Quality Matrix Heatmap

The Quality Matrix is a grid where each cell represents one test. Color indicates stability over the selected run window:

ColorClassificationCriteria
GreenStablePassed in 100% of runs within the window
YellowFlakyMixed pass/fail results with no corresponding code change
RedCriticalFailed in the most recent run and in >50% of the window
GrayUnknownFewer than 3 runs recorded

Reading the Matrix

Hover over any cell to see the test name, pass/fail counts, and a mini sparkline of recent results. Click a cell to open the test in the Flaky Analysis Lab.

Flaky Score

Each test receives a Flaky Score between 0 and 1:

text
Flaky Score = state_transitions / (total_runs - 1)

Example: a test that went Pass-Fail-Pass-Fail-Pass over 5 runs:
  state_transitions = 4
  Flaky Score = 4 / 4 = 1.0 (maximally flaky)

Example: a test that went Pass-Pass-Pass-Fail-Pass:
  state_transitions = 2
  Flaky Score = 2 / 4 = 0.5 (moderately flaky)

Tests with a Flaky Score above 0.3 are classified as Flaky. Above 0.7, they are flagged for urgent investigation.

Infrastructure Alerts

Not every intermittent failure is a test problem. xyva correlates failures with system metrics:

  • CPU load > 80% during the test run
  • Memory pressure causing browser process OOM
  • Network latency spikes affecting API-dependent tests

When a failure correlates with infrastructure stress, the cell in the Quality Matrix shows a lightning icon. These tests are excluded from the Flaky Score calculation.

System Metrics Source

Infrastructure data comes from the host machine's OS metrics captured by xyva during each run. On VPS/CI environments, additional data can be ingested from monitoring APIs (see Settings).

Filtering and Sorting

The Quality Matrix supports several filter modes:

  • Show Flaky Only — hides Stable and Unknown tests
  • Show Critical Only — focuses on currently failing tests
  • Sort by Flaky Score — brings the most erratic tests to the top
  • Group by File — clusters tests from the same spec file together

Actionable Workflow

  1. Open Intelligence and switch to the Quality Matrix tab
  2. Filter to Flaky Only and sort by Flaky Score descending
  3. Click the top flaky test to open the Root Cause Analysis
  4. Review the AI diagnosis and apply the suggested fix
  5. Re-run the test to verify the fix resolves the flakiness

False Positives

Tests that depend on external services (third-party APIs, email providers) may appear flaky due to upstream instability. Consider mocking these dependencies before investigating the test code.

Local-first QA orchestration.