-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Unstable pairwise_distances #11711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm having the same problem
Same NumPy, SciPy, sklearn, Windows 10 build 17134. |
Thanks for the report! See #9354 There is a know issue with the precision of I think it's due to the above linked issue, but I also can't reproduce the results of your code snippet with a comparable configuration on Linux running it multiple times with 0.19.1 and master. If you could define your random datasets with a RNG, (e.g. |
I modified the snippet to use a seed. Note that when using float64, the error is -7.815970093361102e-14, so this is not exactly the same as #9354. The problem is more strange than #9354 because it means the computation with the same function doesn't give twice the same result. The context of this code is that I compute the distance matrix of a high-dimensional dataset, so I proceed by chunks and fill small squares of the matrix. I noticed that the result depends on the chunk size, which is not normal. |
I think this is a standard numerical precision issue. do you have reason to
believe we can correctly compute a sum of this dimensionality? Or should we
spending time on an all loss check to explicitly zero it.
|
@jnothman I'm not complaining about a numerical precision issue. I'm complaining about the fact that the result depends on the other elements of the matrix. |
I see. Sorry, haven't had my coffee. Interesting. Deserves investigation. |
I can reproduce. It seems to come from NumPy: import numpy as np
np.random.seed(42)
a = np.array(np.random.rand(9, 10000), dtype=np.float32)
m1 = np.dot(a, a.T)
m2 = np.dot(a[:4], a[:4].T)
print(m2[0, 1] - m1[0, 1])
# 0.0012207
Note that this can probably be considered as a normal numerical error (see e.g. numpy/numpy#11419 (comment)). |
Thanks for the code snipped @TomDLT ! Reported the issue upstream in numpy/numpy#11655 with additional details. It might be linked to MKL, might be also the expected ~7 significant digits precision of 32 bit floats. In any case, this is a fairly low level issue and numpy would be a better place to discuss it. Should we close this in favor of the numpy issue? |
I can reproduce the snippet of @TomDLT, but it prints What is strange is that some machines don't have this problem. I agree to close this here! |
Thanks for the detailed report in any case @louisabraham ! |
Description
On some machines, the code below can output a nonzero value.
Steps/Code to Reproduce
The bug also exists with the cosine distance but not the cityblock.
Expected Results
0.0
Actual Results
1.1444092e-05
Versions
On this config, the bug doesn't exist:
The problem might come from another component.
The text was updated successfully, but these errors were encountered: