-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
Description
Hey guys,
I recently updated scikit-learn to version 0.21 and suddenly the results of a project of mine became very bad. After more than a day of investigation, I found that the error is happening in pairwise_distances, where the distance matrix becomes 0 after some number n, in my case 2028.
Essentially, in this line of code:
distances = sklearn.metrics.pairwise.pairwise_distances(X)
where X is a feature matrix of size (8131, 1024).
The results of distances are correct for the first 2028 elements, while for the elements starting from 2029, everything becomes 0.
This error doesn't happen if you use version 0.20.2 or 0.20.3.
I also saw that if we use scipy.spatial.distance.pdist, the results match those of sklearn 0.20.
Clearly, the error is an overflow, and it can be mitigated by simply casting X to float64. I saw that there have been similar errors in the past, though considering that this works well in 0.20 but it is a bug in 0.21, probably it wouldn't be a bad idea to mention in the documentation that the feature matrix (X) should be cast from float32 to float64 (which I confirmed to solve the issue).
I can provide the X, distances (sklearn 0.21), distances (sklearn 0.20) and distances (pdist) if someone wants to further investigate and or to replicate these results.
Thanks!