-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: MacOS matmul FPE heisenbug seen on CI #28227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See e.g. this CI run as well as this PR from December. |
The MacOS failures are newish, occasional failures on Windows have been seen for years. That said, I have the impression that the Windows failures are less frequent now. |
Hmmm, I feel there was a time where I had occasional errors similar to this locally. But I don't think there were in I doubt it originates in NumPy. More likely something in the BLAS layer (and possible an FPE invalid SIMD optimization). |
It may be related to #28687. In that issue, |
Used to be mostly on Microsoft, the Mac issue is fairly new. I've sometimes wondered it matmul was picking up the FPE from another test, but that raises the question: why it is always matmul? |
A loose idea, but maybe it's because a random runner happened to have an M4 processor? |
I can't find an open an issue about this, but for a while now we have infrequent failures on MacOS CI due to a heisenbug:
As far as I know, no one has been able to reproduce this outside of CI and no one has been able to reliably trigger it besides running the MacOS CI repeatedly.
Unfortunately, adding free-threaded CI is making this happen more often because we run CI on Mac runners more often.
I'm opening this issue to have a place to link to when I merge PRs with CI failing on this one test and hopefully to attract someone who can figure this out and fix it!
The text was updated successfully, but these errors were encountered: