sum result different when slicing before sum #13734

takluyver · 2019-06-07T17:08:43Z

I've run into a case where these two expressions give substantially different results:

a.sum(axis=0)[0]  # 3223672.5
a[:, 0].sum(axis=0)  # 3223679.2

I imagine I'm running into the limits of floating point precision in some form - the data are float32. But I'm puzzled why they're different - I'd expect these two to sum up exactly the same values and so be affected by finite precision in the same way.

Other details:

The difference is large enough to fail np.testing.assert_allclose
As far as I can find, all values are 'normal' numbers - no NaN or infinity
They don't span a particularly large range of values - less than two orders of magnitude.
All values are positive (I vaguely recall that floating point error can be magnified if large positive and negative numbers cancel each other out).

I can see a difference with random data, but the data I'm using seems to produce a particularly big difference, so I've made an example in a gist including a .npy array with this data.

git clone https://gist.github.com/07c7922f90749ae30d7d45fa02ab6851.git numpy_sum_behaviour
cd numpy_sum_behaviour/
python3 numpy_slice_sum.py

Numpy/Python version information:

>>> import sys, numpy; print(numpy.__version__, sys.version)
1.16.3 3.7.3 (default, May 11 2019, 00:38:04) 
[GCC 9.1.1 20190503 (Red Hat 9.1.1-1)]

I originally saw this on 1.16.4 on another machine, and cut the example down on my laptop with 1.16.3.

The text was updated successfully, but these errors were encountered:

seberg · 2019-06-07T17:21:56Z

Yes, summing along the fast axis gives additional numerical precision because numpy uses pairwise summation. But if you sum along the slow axis (in a multi dimensional array), doing the pairwise summation would be much slower, so it doesn't happen.

It is a strange beast, but since you basically get bonus precision, in the end we have it...

seberg · 2019-06-07T17:22:32Z

There should be an open issue about it somewhere, that it should be documented better (although I have a doubt it would be easy to discover even if documented).

takluyver · 2019-06-07T17:35:13Z

Thanks for the explanation - I didn't know about pairwise summation.

If it helps with the documentation, I went looking for notes relating to this in the docs for numpy.sum.

It looks like #9393 is the issue to document this. Feel free to close this issue. Now that I have a search term, I see that #11331 is also likely very similar to this.

seberg · 2019-06-07T21:02:11Z

Hmmm, I tried to write a bit:

    The numerical precision of sum (and ``np.add.reduce``) is in general
    limited by directly adding each number individually to the result
    causing rounding errors in every step.
    However, often numpy will use a  numerically better approach
    (pairwise summation) leading to improved precision in many use cases.
    This improved precision is always provided when no ``axis`` is given.
    When ``axis`` is given, it will depend on which axis is summed.
    Technically, to provide the best speed possible, the improved precision
    is only used when the summation is along the fast axis in memory.
    Note that the exact precision may vary depending on other parameters.
    In contrast to NumPy, Python's ``sum`` function uses a slower but more
    precise approach to summation.

but not sure it isn't too technical... Will probably open a PR with it in any case.

Note that this behavour is of course inherited into `np.add.reduce` and many other reductions such as `mean` or users of this reduction, such as `cov`. This is ignored here. Closes numpygh-11331, numpygh-9393, numpygh-13734

bbbbbbbbba · 2019-06-08T06:47:14Z

Um... what approach does Python's builtin sum() use? Is it a language feature or an implementation detail?

seberg · 2019-06-08T14:31:23Z

@bbbbbbbbba I have no idea, I guess imlementation detail, see also:

https://en.wikipedia.org/wiki/Kahan_summation_algorithm

The standard library of the Python computer language specifies an fsum function for exactly rounded summation, using the Shewchuk algorithm[9] to track multiple partial sums.

Maybe I should link that site...

seberg · 2019-06-08T14:38:23Z

Which means its incorrect though, fsum uses Shewchuck, sum does not (which makes sense, it would have to recognize floats).

bbbbbbbbba · 2019-06-08T23:11:53Z

Actually, isn't it just that builtin sum is starting from 0 by default?

>>> sum(np.ones((10 ** 8,), dtype = np.float32))
100000000.0
>>> type(0 + np.float32(1))
<class 'numpy.float64'>
>>>
>>> sum(np.ones((10 ** 8,), dtype = np.float32), np.float32(0))
16777216.0
>>> type(np.float32(0) + np.float32(1))
<class 'numpy.float32'>
>>>
>>> sum(np.ones((10 ** 8, 1), dtype = np.float32))
array([16777216.], dtype=float32)
>>> type(0 + np.float32([1]))
<class 'numpy.ndarray'>
>>> (0 + np.float32([1])).dtype
dtype('float32')

seberg · 2019-06-09T01:37:23Z

@bbbbbbbbba I am not sure I parse... Just try to add lots of float64, and you will see differences, especially comparing to fsum.

bbbbbbbbba · 2019-06-09T05:13:51Z

Oh, I think we are actually on the same page now. Apparently you were thinking of fsum when you talked about the "slower but more precise approach to summation". I noticed that sum sometimes also seemed to be more precise than the naive approach, but it turned out that it was only because it used float64.

mattip · 2019-06-11T05:57:18Z

Closed by #13737

seberg added the 57 - Close? Issues which may be closable unless discussion continued label Jun 7, 2019

seberg mentioned this issue Jun 7, 2019

DOC: Mention and try to explain pairwise summation in sum #13737

Merged

mattip closed this as completed Jun 11, 2019

PBarde mentioned this issue Mar 19, 2020

sum() should return +inf when it fails due to rounding errors #15785

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

sum result different when slicing before sum #13734

sum result different when slicing before sum #13734

takluyver commented Jun 7, 2019

seberg commented Jun 7, 2019

Uh oh!

seberg commented Jun 7, 2019

Uh oh!

takluyver commented Jun 7, 2019

Uh oh!

seberg commented Jun 7, 2019

Uh oh!

bbbbbbbbba commented Jun 8, 2019

Uh oh!

seberg commented Jun 8, 2019

Uh oh!

seberg commented Jun 8, 2019

Uh oh!

bbbbbbbbba commented Jun 8, 2019

Uh oh!

seberg commented Jun 9, 2019

Uh oh!

bbbbbbbbba commented Jun 9, 2019

Uh oh!

mattip commented Jun 11, 2019

Uh oh!

Uh oh!

sum result different when slicing before sum #13734

sum result different when slicing before sum #13734

Comments

takluyver commented Jun 7, 2019

Numpy/Python version information:

seberg commented Jun 7, 2019

Uh oh!

seberg commented Jun 7, 2019

Uh oh!

takluyver commented Jun 7, 2019

Uh oh!

seberg commented Jun 7, 2019

Uh oh!

bbbbbbbbba commented Jun 8, 2019

Uh oh!

seberg commented Jun 8, 2019

Uh oh!

seberg commented Jun 8, 2019

Uh oh!

bbbbbbbbba commented Jun 8, 2019

Uh oh!

seberg commented Jun 9, 2019

Uh oh!

bbbbbbbbba commented Jun 9, 2019

Uh oh!

mattip commented Jun 11, 2019

Uh oh!