-
-
Notifications
You must be signed in to change notification settings - Fork 11k
GCC 15.1.1 compiles NumPy, but the tests segfault. #28991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
The problem seems to be in boolean indexing. |
Can you share a gdb traceback? |
I was going to wait on the next GCC 15 version. Fedora (and Linus!) jumped the gun in using 15 in a release. |
I can reproduce segfaults on a small chunk of the testsuite (and, well, in production, which is what led me to double-check ;p) with GCC 15.1.0 over here, too, FWIW (Gentoo Linux, x64, on a zen4 CPU). Everything passes w/ GCC 14.x (right now, 14.3.0). Will see about getting some traces in the next few days. |
Fedora only shipped it briefly before the final release. The next GCC 15 release will be in a couple of months, then it goes to yearly (same schedule as usual). It's best if it turns out to be a GCC bug to get it reported before that next one. Otherwise, relying on the fix being out there takes way longer. |
I tried to reproduce this in the following environments:
@NiLuJe Feel free to file a bug on the Gentoo side so we can dig into it a bit there and see if we can come up with something to share on this side if there's nothing obvious we find here. If you do, please include the testsuite results when run via the ebuild (inc. build.log) and emerge --info. Thanks! |
That was an interesting observation! I'm using LTO doesn't affect the result, but, unsurprisingly (somewhat? I was under the impression that the buildsystem enforced O3 anyway?), I'll file that on the Gentoo side w/ the logs. |
Here it is: https://bugs.gentoo.org/956770 |
I've found using I'm suspecting a conflict with either meson or the dispatcher. The lesson might be: don't use |
Summarising findings from the Gentoo side: it's a numpy bug.
Looking at npy_memchr, I see UBSAN gets suppressed at numpy/numpy/_core/src/multiarray/common.h Line 233 in 5ab56f6
(It actually looks quite similar to what subversion was doing: https://bugs.gentoo.org/950271). Changing https://github.com/numpy/numpy/blob/main/numpy/_core/include/numpy/npy_cpu.h#L128 to set I recommend ripping all of that out and always assuming |
Thanks for the deep dive! Ping @seberg - it looks like you added the suppression in 2022. |
🤷 this is ancient code, I just added the ignore because the variable is supposed to be set when we know unaligned access is supported and also it clearly was supported on all systems UBSAN ever ran on. I don't care for keeping While I doubt it is important, I do think that this did/does make some sense here (no idea how much), in subversion, and I am very sure I also saw it in linux kernel code. |
Could be that gcc is using SIMD instructions that expect alignment. |
It's precisely that, yeah. GCC will even peel for alignment now to handle misaligned workloads.
Right, it was absolutely a reasonable thing to do and it would've given a nice boost in the past. On the other hand, it's always a pain when you start debugging something and realise it's an It's just at odds with expecting compilers to autovectorise things these days. I recommend just dropping all of it, and if there's performance regressions, we can look into those if-and-when. |
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. Fixes: numpy#28991
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. Fixes: numpy#28991
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. Fixes: numpy#28991
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. This might also fix numpy#25004. Fixes: numpy#28991
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. This might also fix numpy#25004. Fixes: numpy#28991
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. This might also fix numpy#25004. Fixes: numpy#28991
GCC 15 generates code that segfaults on newer hardware when running tests. See numpygh-28991 for details. A fix is to always require alignment, which can be done by setting `NPY_ALIGNMENT_REQUIRED = 1`. We retain `NPY_ALIGNMENT_REQUIRED` here for downstream compatibility in the 2.3.x release series, but do remove its use in our own code. The associated macro `NPY_USE_UNALIGNED_ACCESS` could also be removed as it is always 0, but that is left for another PR.
GCC 15 generates code that segfaults on newer hardware when running tests. See numpygh-28991 for details. A fix is to always require alignment, which can be done by setting `NPY_ALIGNMENT_REQUIRED = 1`. We completely remove `NPY_ALIGNMENT_REQUIRED` here in order to test downstream compatibility before the 2.4 release. It is not expected to cause problems, but one never knows. The associated macro `NPY_USE_UNALIGNED_ACCESS` could also be removed as it is always 0, but that is left for another PR.
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. This might also fix numpy#25004. Fixes: numpy#28991
* This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years. This might also fix numpy#25004. Fixes: numpy#28991
The failure is at
_core/tests/test_half.py
, line 54. I assume that is a problem with GCC 15.1.1, as it has had a rocky start. Has anyone else had problems? The problem doesn't seem to be in NumPy, it was compiling and running fine before I upgraded Fedora to 42.The text was updated successfully, but these errors were encountered: