-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
sum result different when slicing before sum #13734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, summing along the fast axis gives additional numerical precision because numpy uses pairwise summation. But if you sum along the slow axis (in a multi dimensional array), doing the pairwise summation would be much slower, so it doesn't happen. It is a strange beast, but since you basically get bonus precision, in the end we have it... |
There should be an open issue about it somewhere, that it should be documented better (although I have a doubt it would be easy to discover even if documented). |
Thanks for the explanation - I didn't know about pairwise summation. If it helps with the documentation, I went looking for notes relating to this in the docs for numpy.sum. It looks like #9393 is the issue to document this. Feel free to close this issue. Now that I have a search term, I see that #11331 is also likely very similar to this. |
Hmmm, I tried to write a bit:
but not sure it isn't too technical... Will probably open a PR with it in any case. |
Note that this behavour is of course inherited into `np.add.reduce` and many other reductions such as `mean` or users of this reduction, such as `cov`. This is ignored here. Closes numpygh-11331, numpygh-9393, numpygh-13734
Um... what approach does Python's builtin sum() use? Is it a language feature or an implementation detail? |
@bbbbbbbbba I have no idea, I guess imlementation detail, see also: https://en.wikipedia.org/wiki/Kahan_summation_algorithm
Maybe I should link that site... |
Which means its incorrect though, |
Actually, isn't it just that builtin
|
@bbbbbbbbba I am not sure I parse... Just try to add lots of float64, and you will see differences, especially comparing to |
Oh, I think we are actually on the same page now. Apparently you were thinking of |
Closed by #13737 |
I've run into a case where these two expressions give substantially different results:
I imagine I'm running into the limits of floating point precision in some form - the data are float32. But I'm puzzled why they're different - I'd expect these two to sum up exactly the same values and so be affected by finite precision in the same way.
Other details:
np.testing.assert_allclose
NaN
or infinityI can see a difference with random data, but the data I'm using seems to produce a particularly big difference, so I've made an example in a gist including a
.npy
array with this data.git clone https://gist.github.com/07c7922f90749ae30d7d45fa02ab6851.git numpy_sum_behaviour cd numpy_sum_behaviour/ python3 numpy_slice_sum.py
Numpy/Python version information:
I originally saw this on 1.16.4 on another machine, and cut the example down on my laptop with 1.16.3.
The text was updated successfully, but these errors were encountered: