tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. #17745

AJMansfield · 2025-07-22T18:47:46Z

Summary

extmod/select_poll_eintr.py is a constant source of spurious failures in Github CI.

This PR adds it to the list of tests skipped when running on Github CI, to help reduce the overall false positive rate and improve the predictive value of the test fail indication.

Testing

I exampled a sample of the last 25 failed Github Actions runs, tabulated their causes, and calculated relevant confusion matrix statistics over the results to determine that there is in fact adequate statistical evidence to support my original anecdotal experience with extmod/select_poll_eintr.py being problematic.

Action Run	Failed Job(s)	Cause
16447411965	stackless_clang	thread/stress_aes.py
16446157516	qemu_riscv64	thread/stress_aes.py
16445640721	qemu_riscv64	thread/stress_aes.py
16445092499	standard_v2	extmod/select_poll_eintr.py
16442539782	settrace_stackless	extmod/select_poll_eintr.py
16439460414	standard_v2 stackless_clang	extmod/select_poll_eintr.py extmod/select_poll_eintr.py
16439339413	settrace_stackless	extmod/select_poll_eintr.py
16438892781	standard_v2	extmod/select_poll_eintr.py
16438838082	standard	extmod/select_poll_eintr.py
16438686105	standard_v2 settrace_stackless	extmod/select_poll_eintr.py extmod/select_poll_eintr.py
16437062166	float_clang settrace_stackless	extmod/select_poll_eintr.py extmod/select_poll_eintr.py
16435694536	settrace_stackless	extmod/select_poll_eintr.py
16435294140	standard_v2	extmod/select_poll_eintr.py
16435084663	settrace_stackless	extmod/select_poll_eintr.py
16434901639	float	extmod/select_poll_eintr.py
16433931194	standard	extmod/select_poll_eintr.py
16433726206	standard_v2 stackless_clang macos 10 other jobs	extmod/select_poll_eintr.py extmod/select_poll_eintr.py extmod/select_poll_eintr.py basics/slice_optimse.py basics/slice_optimse.py
16433010322	standard standard_v2 longlong	extmod/select_poll_eintr.py extmod/select_poll_eintr.py (many failures)
16432556955	14 jobs	build failure
16432475831	settrace_stackless	extmod/select_poll_eintr.py
16432121694	longlong	extmod/vfs_rom.py import/import_broken.py
16421543831	standard	extmod/select_poll_eintr.py
16420969407	standard standard_v2	extmod/select_poll_eintr.py extmod/select_poll_eintr.py
16420440397	standard_v2	extmod/select_poll_eintr.py
16418722881	standard standard_v2 settrace_stackless	extmod/select_poll_eintr.py extmod/select_poll_eintr.py extmod/select_poll_eintr.py

(Note that reproducible was excluded from tabulation as it doesn't run extmod/select_poll_eintr.py )

20 of these 25 examined runs include extmod/select_poll_eintr.py as a failure, compared to only 6 runs that include any other kind of failure.
As far as I can tell, none of these failures have anything to do with changes made to the select module in the triggering branch, making all but the one run that also included another failure false positives.
Over the same sample period, there were a total of 9 passing unix runs. Under the assumption that all 6 non-extmod/select_poll_eintr.py failed runs are true positives and that all 9 of these passing runs are true negatives, that gives the test suite with extmod/select_poll_eintr.py included a false positive rate of 67.8%, a positive predictive value of only 24%, and an F1 score of 0.387. These values support the conclusion that the rate of spurious failures is excessive, and that the usefulness of the CI failure indicator is diluted as a result.

Considering extmod/select_poll_eintr.py individually, this test has a per-job false positive rate of 5.5% and a per-run fpr of 60.6%. This supports the conclusion that the weak predictive value of the test suite is largely attributable to this test.

Overall, the sample I examined supports the conclusion that extmod/select_poll_eintr.py is problematic should be excluded from Github CI runs going forward.

Statistics Code, for anyone who cares to check my math:

from dataclasses import dataclass

@dataclass
class ConfusionMatrix:
    tp: int
    tn: int
    fp: int
    fn: int

    @property
    def p(self):
        return self.tp + self.fn

    @property
    def n(self):
        return self.fp + self.tn

    @property
    def pp(self):
        return self.tp + self.fp

    @property
    def pn(self):
        return self.fn + self.tn
    
    @property
    def pop(self):
        return self.tp + self.fp + self.tn + self.fn

    @property
    def tpr(self):
        return self.tp / self.p
    
    @property
    def fnr(self):
        return self.fn / self.p
    
    @property
    def fpr(self):
        return self.fp / self.n
    
    @property
    def tnr(self):
        return self.tn / self.n
    
    @property
    def ppv(self):
        return self.tp / self.pp
    
    @property
    def npv(self):
        return self.tn / self.pn
    
    @property
    def fdr(self):
        return self.fp / self.pp
    
    @property
    def fOr(self):
        return self.fn / self.pn
    
    @property
    def f1(self):
        return 2*self.tp / (2*self.tp + self.fp + self.fn)

    def report(self, title):
        return f"""\
{title}
  Population: {self.pop}
  Confusion Matrix:
              PPos PNeg
    Positive: {self.tp: 4} {self.fn: 4}
    Negative: {self.fp: 4} {self.tn: 4}
  Positive Predictive Value: {self.ppv:%}
  False Positive Rate: {self.fpr:%}
  F1 Score: {self.f1}
"""

# 19 fail runs with only eintr
# 5 fail runs with only other failures (1 of them precluded eintr)
# 1 fail run with both
# 9 pass runs
print(ConfusionMatrix(
    tp = 5 + 1,
    fp = 19,
    tn = 9,
    fn = 0,
).report("Overall, by runs:"))

print(ConfusionMatrix(
    tp = 0,
    fp = 19 + 1,
    tn = 9 + 4,
    fn = 0,
).report("eintr, by runs:"))

# 28 fail jobs with only eintr
# 28 fail jobs with only other failures
# 1 fail job with both
# 327 pass jobs from fail runs
# 144 pass jobs from pass runs
print(ConfusionMatrix(
    tp = 28 + 1,
    fp = 28,
    tn = 327 + 144,
    fn = 0,
).report("Overall, by jobs:"))

print(ConfusionMatrix(
    tp = 0,
    fp = 28 + 1,
    tn = 327 + 144 + 28,
    fn = 0,
).report("eintr, by jobs:"))

Output:

Overall, by runs:
  Population: 34
  Confusion Matrix:
              PPos PNeg
    Positive:    6    0
    Negative:   19    9
  Positive Predictive Value: 24.000000%
  False Positive Rate: 67.857143%
  F1 Score: 0.3870967741935484

eintr, by runs:
  Population: 33
  Confusion Matrix:
              PPos PNeg
    Positive:    0    0
    Negative:   20   13
  Positive Predictive Value: 0.000000%
  False Positive Rate: 60.606061%
  F1 Score: 0.0

Overall, by jobs:
  Population: 528
  Confusion Matrix:
              PPos PNeg
    Positive:   29    0
    Negative:   28  471
  Positive Predictive Value: 50.877193%
  False Positive Rate: 5.611222%
  F1 Score: 0.6744186046511628

eintr, by jobs:
  Population: 528
  Confusion Matrix:
              PPos PNeg
    Positive:    0    0
    Negative:   29  499
  Positive Predictive Value: 0.000000%
  False Positive Rate: 5.492424%
  F1 Score: 0.0

codecov · 2025-07-22T18:51:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.41%. Comparing base (e993f53) to head (26d9bf2).
Report is 8 commits behind head on master.

Additional details and impacted files

@@           Coverage Diff           @@
##           master   #17745   +/-   ##
=======================================
  Coverage   98.41%   98.41%           
=======================================
  Files         171      171           
  Lines       22210    22210           
=======================================
  Hits        21857    21857           
  Misses        353      353

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

extmod/select_poll_eintr.py is a constant source of spurious failures in Github CI. This PR adds it to the list of tests skipped in that environment in order to improve the test suite's false positive rate and positive predictive value in detecting defects. Signed-off-by: Anson Mansfield <amansfield@mantaro.com>

dpgeorge · 2025-07-23T00:13:15Z

Thanks for the very detailed analysis!

I should have been clearer that this is intended to be fixed (with a workaround for the true bug) by #17655.

AJMansfield · 2025-07-23T01:45:42Z

I should have been clearer that this is intended to be fixed (with a workaround for the true bug) by #17655.

Oh, I think that did actually come up in the search I did, guess I should've read further.

dpgeorge · 2025-07-23T01:47:29Z

And that PR has just been merged, so CI should be a lot happier now.

AJMansfield · 2025-07-23T01:57:04Z

And that PR has just been merged, so CI should be a lot happier now.

Ty! Since, this test is the main reason I didn't notice the other rv32 test failures when I was reviewing #17716 originally --- with how conditioned I was starting to get, expecting there to always be one or two failures in every CI run.

tpwrules · 2025-08-10T23:22:14Z

This still seems to be flaky on macOS in my experience.

dpgeorge · 2025-08-11T00:47:40Z

This still seems to be flaky on macOS in my experience.

Can you point to a few recent runs where it fails? Or is it only locally on your machine?

tpwrules · 2025-08-11T00:49:35Z

Sorry, this is locally on my machine building through the Nix package manager. Just wanted to comment as an FYI as I was looking for other reports of similar failures. I simply disabled the test on our end.

dpgeorge · 2025-08-11T01:02:48Z

OK, thanks, good to know.

The actual bug that leads to this unreliability is described in #11604. Hopefully will be fixed on day!

AJMansfield changed the title ~~tests/extmod/select_poll_eintr: Skip unreliable test on ci/cd.~~ tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. Jul 22, 2025

AJMansfield force-pushed the cicd-ignore-broken-eintr branch from 28ea4a6 to 26d9bf2 Compare July 22, 2025 18:54

AJMansfield closed this Jul 23, 2025

AJMansfield deleted the cicd-ignore-broken-eintr branch July 23, 2025 01:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. #17745

tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. #17745

Uh oh!

AJMansfield commented Jul 22, 2025 •

edited

Loading

Uh oh!

codecov bot commented Jul 22, 2025 •

edited

Loading

Uh oh!

dpgeorge commented Jul 23, 2025

Uh oh!

AJMansfield commented Jul 23, 2025

Uh oh!

dpgeorge commented Jul 23, 2025

Uh oh!

AJMansfield commented Jul 23, 2025 •

edited

Loading

Uh oh!

tpwrules commented Aug 10, 2025

Uh oh!

dpgeorge commented Aug 11, 2025

Uh oh!

tpwrules commented Aug 11, 2025

Uh oh!

dpgeorge commented Aug 11, 2025

Uh oh!

Uh oh!

Uh oh!

tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. #17745

tests/extmod/select_poll_eintr: Skip unreliable test in Github CI. #17745

Uh oh!

Conversation

AJMansfield commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Testing

Uh oh!

codecov bot commented Jul 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

dpgeorge commented Jul 23, 2025

Uh oh!

AJMansfield commented Jul 23, 2025

Uh oh!

dpgeorge commented Jul 23, 2025

Uh oh!

AJMansfield commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tpwrules commented Aug 10, 2025

Uh oh!

dpgeorge commented Aug 11, 2025

Uh oh!

tpwrules commented Aug 11, 2025

Uh oh!

dpgeorge commented Aug 11, 2025

Uh oh!

Uh oh!

AJMansfield commented Jul 22, 2025 •

edited

Loading

codecov bot commented Jul 22, 2025 •

edited

Loading

AJMansfield commented Jul 23, 2025 •

edited

Loading