-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: float64 nanmin() returns NaN on little-endian MIPS (mipsel, mips64el) #23158
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Is there a GCC compile farm machine with the appropriate configuration to debug this? I checked https://cfarm.tetaneutral.net/machines/list/ and tried host
|
I'm not aware of any public machines, I'm afraid. I can test things on Debian's mips porterbox, if that helps. I haven't tried to debug this, at all, beyond pinpointing the issue in numpy. I believe that QEMU's support for mips64el and mipsel are both usable. |
I tried to have a bit of a look, but need to try to get emulation running. It does seem like this this should just end up using |
@stefanor any chance you could make sure that
Compile with |
On both mips64el and mipsel:
|
Whenever I return to this, I don't really see how this can happen (sorry I didn't get to a working qemu/native setup, yet). Do you have the NumPy build setup/flags ready or are they the defaults? Maybe there is something simple like someone being a bit overzealous on optimization and enabling |
Looks like it's all defaults. You can see them in a build log: https://buildd.debian.org/status/fetch.php?pkg=numpy&arch=mips64el&ver=1%3A1.24.2-1&stamp=1675927314&raw=0 |
I see gcc12 and these flags:
Do we have any known successful little-endian systems? |
We don't see the issue on any other Debian platforms: https://buildd.debian.org/status/logs.php?pkg=silx&ver=1.1.0%2Bdfsg-4 (there is a test failure on s390x, but that's unrelated) |
Thanks. Looking for instance at the sucessful ppc64el build log, I see gcc12 but not the specific flags used. With that, I do not see any flag in the mips64el build that should cause problems. |
Maybe disabling all kinds of optimization including disabling optimization level 3 can be used as a workaround for this issue through the following command: python setup.py build --disable-optimization install --user through pip: pip install --no-use-pep517 --global-option=build \
--global-option="--disable-optimization" ./ This issue may be related to the following manual unroll which can be disabled by build option numpy/numpy/core/src/umath/loops_minmax.dispatch.c.src Lines 368 to 436 in bc8cf11
|
note that we are using C99 fmin/fmax under flag level 3 which gives a strong hint to the compiler to generate SIMD paths: numpy/numpy/core/src/umath/loops_minmax.dispatch.c.src Lines 35 to 40 in bc8cf11
|
I tried a build with From the end of the build log:
|
I am not sure that it makes sense to try to spend much time on this TBH, but just to see... one more question. Given:
as
give you? On x86 (without optimizatoin) it gives for example:
Which calls ( |
FWIW, a bisect pointed to #20131 as the origin point of this bug (which may not be that surprising, it was a massive rewrite of the relevant code). The disassemble (from 1.24.2):
And yes, only one
|
Honestly, I see nothing weird. I still suspect it is a compiler issue somehow. But I have no idea, I guess it jumps to We could put a work-around in place if necessary, but right now I don't even know how that would look like, since the looks perfectly fine. |
Is it about the encoding of nan? If so, I guess it is due to the NaN-legacy vs NaN2008 problem? For NaN-legacy hardware, the encoding of qNaN is different with other hardwares. I will try to dig it out. |
I don't really see why/how, but considering that there is nothing obvious here, maybe? If I.e. maybe code using |
It seems to work, and yes it's different:
|
Fine, so I guess you get: https://sourceware.org/binutils/docs/as/MIPS-NaN-Encodings.html So @wzssyqa is right and MIPS is weird, which supposedly can be specified what NaNs you want, but no idea what helps but maybe you should just compile everything with There are layers of problems here I think:
Do you/we care enough to push fixes into Python for this? Maybe we should just add that MIPS Or, you fix the definition of NAN to use C99 |
Interestingly, using that instead of But yeah, I think I'm happy to call this the MIPS port's problem, and let them drive a solution. I can't even compile with Thanks for your time :( |
I may have misread the Python code. In the dark dark depths, there is this function: _Py_dg_stdnan. Which is used if and only if This code was recently touched here: python/cpython#31171 but neither before nor after can I figure out if/why that path might actually be taken on MIPS (I guess due to a configure mistake?). I suppose a But the thing about all of this is. Below the nice neat comment (one place where it is used):
is an unguarded function that also uses Py_NAN. So, maybe do have a look at the sysconfig, and then maybe we should just delete that function from cPython because deleting 100 lines of dead code is always nice :). |
I misunderstood the The minimal Python fix would be to just modify |
Closing this issue, it clearly isn't NumPy here. Its MIPS, and Python, and the compiler. And potentially all three working together with each doing something wrong. |
It seems to me code all around relies on both being correct anyway. The actual value for Py_NAN is subtly incorrect on MIPS (depending on settings) or at least nonstandard, which seems to confuse some builtin functions. (Probably it is signalling, but NumPy saw this with fmin, which probably should also ignore signalling NaNs, see also numpy/numpy#23158). The guards about `_PY_SHORT_FLOAT_REPR` making sense are relatively unrelated to NAN and INF being available. Nevertheless, I currently hide the Py_NAN definition if that is not set, since I am not sure what good alternative there is to be certain that Py_NAN is well defined. OTOH, I do suspect there is no platform where it is not and it should probably be changed?!
Describe the issue:
Build failures of silx on debian logs point to a problem with
numpy.nanmin()
forfloat64
on little endian MIPS platforms.Reproduce the code example:
Error message:
Runtime information:
Context for the issue:
This is a fundamental bug in a data type, breaking higher level verification tests in other packages.
The text was updated successfully, but these errors were encountered: