Description
Description
When using KMeans, setting precomputer_distances to True gives different cluster assignments for the same data set on different machines.
Steps/Code to Reproduce
Example:
kmeans = KMeans(init="k-means++", precompute_distances = True, n_clusters = num_clusters, random_state=get_prime(), n_jobs=-2)
get_prime() returns a prime number in a deterministic way (it's got an array of primes that it iterates over)
Expected Results
The same cluster assignment on different machines.
Actual Results
Different cluster assignments on different machines. I realized that by comparing the silhouette score. It's worth noting that if I repeat the clustering on the same machine the results are identical (clustering is the same).
Versions
Machine A:
import platform; print(platform.platform())
Linux-3.13.0-91-generic-x86_64-with-Ubuntu-14.04-trusty
import sys; print("Python", sys.version)
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4]
import numpy; print("NumPy", numpy.version)
NumPy 1.8.2
import scipy; print("SciPy", scipy.version)
SciPy 0.13.3
import sklearn; print("Scikit-Learn", sklearn.version)
Scikit-Learn 0.17.1
Machine B:
import platform; print(platform.platform())
Linux-3.16.0-34-generic-x86_64-with-Ubuntu-14.04-trusty
import sys; print("Python", sys.version)
Python 3.4.3 (default, Oct 14 2015, 20:28:29)
[GCC 4.8.4]
import numpy; print("NumPy", numpy.version)
NumPy 1.10.4
import scipy; print("SciPy", scipy.version)
SciPy 0.13.3
import sklearn; print("Scikit-Learn", sklearn.version)
Scikit-Learn 0.17.1
The NumPy version and the kernel version differs.