Flaky Test Detection

Flaky tests — tests that pass and fail without code changes — erode trust in your test suite. The Intelligence dashboard identifies and classifies them automatically.

Quality Matrix Heatmap

The Quality Matrix is a grid where each cell represents one test. Color indicates stability over the selected run window:

Color	Classification	Criteria
Green	Stable	Passed in 100% of runs within the window
Yellow	Flaky	Mixed pass/fail results with no corresponding code change
Red	Critical	Failed in the most recent run and in >50% of the window
Gray	Unknown	Fewer than 3 runs recorded

Reading the Matrix

Hover over any cell to see the test name, pass/fail counts, and a mini sparkline of recent results. Click a cell to open the test in the Flaky Analysis Lab.

Flaky Score

Each test receives a Flaky Score between 0 and 1:

text

Flaky Score = state_transitions / (total_runs - 1)

Example: a test that went Pass-Fail-Pass-Fail-Pass over 5 runs:
  state_transitions = 4
  Flaky Score = 4 / 4 = 1.0 (maximally flaky)

Example: a test that went Pass-Pass-Pass-Fail-Pass:
  state_transitions = 2
  Flaky Score = 2 / 4 = 0.5 (moderately flaky)

Tests with a Flaky Score above 0.3 are classified as Flaky. Above 0.7, they are flagged for urgent investigation.

Infrastructure Alerts

Not every intermittent failure is a test problem. xyva correlates failures with system metrics:

CPU load > 80% during the test run
Memory pressure causing browser process OOM
Network latency spikes affecting API-dependent tests

When a failure correlates with infrastructure stress, the cell in the Quality Matrix shows a lightning icon. These tests are excluded from the Flaky Score calculation.

System Metrics Source

Infrastructure data comes from the host machine's OS metrics captured by xyva during each run. On VPS/CI environments, additional data can be ingested from monitoring APIs (see Settings).

Filtering and Sorting

The Quality Matrix supports several filter modes:

Show Flaky Only — hides Stable and Unknown tests
Show Critical Only — focuses on currently failing tests
Sort by Flaky Score — brings the most erratic tests to the top
Group by File — clusters tests from the same spec file together

Actionable Workflow

Open Intelligence and switch to the Quality Matrix tab
Filter to Flaky Only and sort by Flaky Score descending
Click the top flaky test to open the Root Cause Analysis
Review the AI diagnosis and apply the suggested fix
Re-run the test to verify the fix resolves the flakiness

False Positives

Tests that depend on external services (third-party APIs, email providers) may appear flaky due to upstream instability. Consider mocking these dependencies before investigating the test code.

Flaky Test Detection ​

Quality Matrix Heatmap ​

Flaky Score ​

Infrastructure Alerts ​

Filtering and Sorting ​