Flaky Test Detection
Flaky tests — tests that pass and fail without code changes — erode trust in your test suite. The Intelligence dashboard identifies and classifies them automatically.
Quality Matrix Heatmap
The Quality Matrix is a grid where each cell represents one test. Color indicates stability over the selected run window:
| Color | Classification | Criteria |
|---|---|---|
| Green | Stable | Passed in 100% of runs within the window |
| Yellow | Flaky | Mixed pass/fail results with no corresponding code change |
| Red | Critical | Failed in the most recent run and in >50% of the window |
| Gray | Unknown | Fewer than 3 runs recorded |
Reading the Matrix
Hover over any cell to see the test name, pass/fail counts, and a mini sparkline of recent results. Click a cell to open the test in the Flaky Analysis Lab.
Flaky Score
Each test receives a Flaky Score between 0 and 1:
Flaky Score = state_transitions / (total_runs - 1)
Example: a test that went Pass-Fail-Pass-Fail-Pass over 5 runs:
state_transitions = 4
Flaky Score = 4 / 4 = 1.0 (maximally flaky)
Example: a test that went Pass-Pass-Pass-Fail-Pass:
state_transitions = 2
Flaky Score = 2 / 4 = 0.5 (moderately flaky)Tests with a Flaky Score above 0.3 are classified as Flaky. Above 0.7, they are flagged for urgent investigation.
Infrastructure Alerts
Not every intermittent failure is a test problem. xyva correlates failures with system metrics:
- CPU load > 80% during the test run
- Memory pressure causing browser process OOM
- Network latency spikes affecting API-dependent tests
When a failure correlates with infrastructure stress, the cell in the Quality Matrix shows a lightning icon. These tests are excluded from the Flaky Score calculation.
System Metrics Source
Infrastructure data comes from the host machine's OS metrics captured by xyva during each run. On VPS/CI environments, additional data can be ingested from monitoring APIs (see Settings).
Filtering and Sorting
The Quality Matrix supports several filter modes:
- Show Flaky Only — hides Stable and Unknown tests
- Show Critical Only — focuses on currently failing tests
- Sort by Flaky Score — brings the most erratic tests to the top
- Group by File — clusters tests from the same spec file together
Actionable Workflow
- Open Intelligence and switch to the Quality Matrix tab
- Filter to Flaky Only and sort by Flaky Score descending
- Click the top flaky test to open the Root Cause Analysis
- Review the AI diagnosis and apply the suggested fix
- Re-run the test to verify the fix resolves the flakiness
False Positives
Tests that depend on external services (third-party APIs, email providers) may appear flaky due to upstream instability. Consider mocking these dependencies before investigating the test code.
