Skip to content

Reducing functions (such as np.mean) return different result when axis keyword argument is specified #11748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Dominik1123 opened this issue Aug 16, 2018 · 1 comment

Comments

@Dominik1123
Copy link

The issue is about several reducing functions such as np.mean or np.std. These functions seem to return a different result when passed a 2D array together with axis argument instead of 1D array which is manually subscripted. Formally these two methods describe the exact same operation and so they should return the same result. For example:

a = np.random.uniform(size=(100, 2))
assert np.mean(a, axis=0)[0] == np.mean(a[:, 0])

However the above assertion fails for various sizes of the 0th-axis.

Reproducing code example:

import numpy as np


def check(f, a):
    axis = f(a, axis=0)
    manual = f(a[:, 0]), f(a[:, 1])
    print(axis[0] == manual[0])
    print(axis[1] == manual[1])
    print('0: {:e}'.format(axis[0] - manual[0]))
    print('1: {:e}'.format(axis[1] - manual[1]))
    print()

a1 = np.empty(shape=(5, 2), dtype=float)
a1[:, 0] = 0.69646919
a1[:, 1] = 0.28613933

a2 = np.empty(shape=(100, 2), dtype=float)
a2[:, 0] = 0.69646919
a2[:, 1] = 0.28613933

print('=== Checking a1 ===', end='\n\n')
check(np.mean, a1)  # Pass.
check(np.std, a1)  # Pass.
check(np.var, a1)  # Pass.
check(np.sum, a1)  # Pass.

print('=== Checking a2 ===', end='\n\n')
check(np.mean, a2)  # Fail.
check(np.std, a2)  # Fail.
check(np.var, a2)  # Fail.
check(np.sum, a2)  # Fail.

Numpy/Python version information:

>>> import sys, numpy; print(numpy.__version__, sys.version)
1.15.0 3.5.2 (default, Nov 23 2017, 16:37:01) 
[GCC 5.4.0 20160609]

Further investigation

Checking various sizes for the 0th-axis for various methods:

import numpy as np
import pandas as pd


def generate(n):
    a = np.empty(shape=(n, 2), dtype=float)
    a[:, 0] = 0.69646919
    a[:, 1] = 0.28613933
    return a


def check(f, a):
    axis = f(a, axis=0)
    manual = f(a[:, 0]), f(a[:, 1])
    return axis[0] == manual[0] and axis[1] == manual[1]


funcs = ['mean', 'std', 'var', 'sum', 'prod']
results = []
for f in map(lambda x: getattr(np, x), funcs):
    results.append([check(f, generate(i)) for i in range(2, 101)])

with pd.option_context('display.max_rows', None):
    print(pd.DataFrame(
        columns=funcs,
        index=range(2, 101),
        data=np.asarray(results).T
    ))

This yields the following results:

      mean    std    var    sum  prod
2     True   True   True   True  True
3     True   True   True   True  True
4     True   True   True   True  True
5     True   True   True   True  True
6     True   True   True   True  True
7     True   True   True   True  True
8    False  False  False  False  True
9    False  False  False  False  True
10   False  False  False  False  True
11    True   True   True  False  True
12    True   True   True  False  True
13    True   True   True  False  True
14    True   True   True   True  True
15    True   True   True   True  True
16   False  False  False  False  True
17   False  False  False  False  True
18   False  False  False  False  True
19   False  False  False  False  True
20   False  False  False  False  True
21   False  False  False  False  True
22    True   True   True  False  True
23   False  False  False  False  True
24   False  False  False  False  True
25   False  False  False  False  True
26   False  False  False  False  True
27   False  False  False  False  True
28   False  False  False  False  True
29   False  False  False  False  True
30   False  False  False  False  True
31   False  False  False  False  True
32   False  False  False  False  True
33   False  False  False  False  True
34   False  False  False  False  True
35   False  False  False  False  True
36   False  False  False  False  True
37   False  False  False  False  True
38   False  False  False  False  True
39   False  False  False  False  True
40   False  False  False  False  True
41   False  False  False  False  True
42   False  False  False  False  True
43   False  False  False  False  True
44   False  False  False  False  True
45   False  False  False  False  True
46   False  False  False  False  True
47   False  False  False  False  True
48   False  False  False  False  True
49   False  False  False  False  True
50   False  False  False  False  True
51   False  False  False  False  True
52   False  False  False  False  True
53   False  False  False  False  True
54   False  False  False  False  True
55   False  False  False  False  True
56   False  False  False  False  True
57   False  False  False  False  True
58   False  False  False  False  True
59   False  False  False  False  True
60   False  False  False  False  True
61   False  False  False  False  True
62   False  False  False  False  True
63   False  False  False  False  True
64   False  False  False  False  True
65   False  False  False  False  True
66   False   True   True  False  True
67   False   True   True  False  True
68   False   True   True  False  True
69   False  False  False  False  True
70   False  False  False  False  True
71   False  False  False  False  True
72   False  False  False  False  True
73   False  False  False  False  True
74   False  False  False  False  True
75   False  False  False  False  True
76   False  False  False  False  True
77   False  False  False  False  True
78   False  False  False  False  True
79   False  False  False  False  True
80   False  False  False  False  True
81   False  False  False  False  True
82   False  False  False  False  True
83   False  False  False  False  True
84   False  False  False  False  True
85   False  False  False  False  True
86   False  False  False  False  True
87   False  False  False  False  True
88   False  False  False  False  True
89   False  False  False  False  True
90   False  False  False  False  True
91   False  False  False  False  True
92   False  False  False  False  True
93   False  False  False  False  True
94   False  False  False  False  True
95   False  False  False  False  True
96   False  False  False  False  True
97   False  False  False  False  True
98   False  False  False  False  True
99   False  False  False  False  True
100  False  False  False  False  True
@seberg
Copy link
Member

seberg commented Aug 16, 2018

Duplicate of #11331 and others.

Numpy takes the freedom of trying to make the computations as fast as possible. Since numerical precision is always limited, doing this changes operation order and thus the result. There is a second effect at work with summing specifically, which can in some case create true large differences (first sight, here it looks like there is probably little serious numerical precision issues involved, mostly due to array sizes).

It would be nice to document that more clearly (especially the large possible differences in some cases), though. If you know somewhere were we should improve the documentation please say so, or propose a change!

Feel free to keep discussing, but there are duplicates of this open and there is not much to do probably, so closing.

@seberg seberg closed this as completed Aug 16, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants