-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
sum gives wrong result for structured arrays #9876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Looks like an alignment problem to me. Do you get a problem with the simpler >>> a = np.zeros(1000, dtype=[('a', np.int32), ('b', np.float64)])
>>> a['a'] = np.arange(1, 1+len(a))
>>> a['b'].sum() |
And in your original, is |
>>> a = np.zeros(1000, dtype=[('a', np.int32), ('b', np.float64)])
>>> a['a'] = np.arange(1, 1+len(a))
>>> a['b'].sum()
2.3671924051608192e-309
|
Oops, I think I meant |
And can you try with |
>>> a['a'] = -1
>>> a['b'].sum()
nan
>>> np.all(a['b'].view((np.uint8, 8)) == 0)
True |
And on much smaller arrays? Is length |
>>> a = np.zeros(1, dtype=[('a', np.int32), ('b', np.float64)])
>>> a['a'] = -1
>>> a['b'].sum()
0.0
>>> a = np.zeros(2, dtype=[('a', np.int32), ('b', np.float64)])
>>> a['a'] = -1
>>> a['b'].sum()
2.1219957904712067e-314
>>> a.tobytes()
b'\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00\xff\xff\xff\xff\x00\x00\x00\x00\x00\x00\x00\x00' |
Definitely an alignment bug. Perhaps not surprisingly, the underlying bits of the lower half of the float are all set. >>> hex(np.float64(2.1219957904712067e-314).view(np.int64))
'0xffffffff' Unfortunately, I can't reproduce this on my machine. You can almost certainly work around it with |
Did this work in previous numpy versions? |
Don't know about previous version. Which version I should try? >>> a = np.zeros(2, dtype=np.dtype([('a', np.int32), ('b', np.float64)], align=True))
>>> a['a'] = -1
>>> a['b'].sum()
2.1219957904712067e-314 I can reproduce this bug on debian's prepackaged numpy and python on another 32bit machine >>> sys.version
'3.5.3 (default, Jan 19 2017, 14:11:04) \n[GCC 6.3.0 20170118]'
>>> np.__version__
'1.12.1' But not on Fedora 64bit >>> sys.version
'3.6.2 (default, Oct 2 2017, 16:51:32) \n[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)]'
>>> np.__version__
'1.12.1' I think it is 32bit version issue |
I just managed to reproduce this bug in an i686 chroot. It's not due to an error in subscipting since the subscripted arrays shows the right values: >>> a = np.zeros(2, dtype=[('a', np.int32), ('b', np.float64)])
>>> a['a'] = -1
>>> y = a['b']
>>> y
array([ 0., 0.])
>>> y.sum()
2.1219957904712067e-314 since |
Yeah, as far as I see we simply never accounted for alignment in the reduction code (in It's time for me to press my cheeks against the sweet cheeks of the pillow, so I'll leave this for today. Anyone feel free to pick up in the meantime! |
After another look, I think it is the implementation and optimizations in the Here's the issue: On x86 (ie 32bit) in linux a double (8 bytes) counts as 4-byte aligned (by gcc, see wikipedia (link) which I confirm in my chroot). Therefore in the examples above, But then, in Unfortunately, for the array in the examples above the stride is not a multiple of the itemsize, so it is not possible to access the array values by pointer-arithmetic, at least with a pointer of the type. Probably the fix is to use void-pointer arithmetic with the actual stride. Not sure if that will kill the optimizations. |
all works fine if I change last one field to int64
Debian stable 32bit
The text was updated successfully, but these errors were encountered: