Closed
Description
Description
There is an inconsistency between "full" and "elkan" in K-Means. I believe the discrepency arises when empty centers are reset, in which case the bound update here might need modification.
Steps/Code to Reproduce
import numpy as np
from sklearn.cluster import KMeans
seed = 51220
rng = np.random.RandomState(seed)
N = 1200
K = 100
X = rng.randn(N, 2)**7
indices_init = np.arange(K, dtype=np.uint64)
C_init = X[indices_init]
for alg in ["elkan", "full"]:
sklc = KMeans(n_clusters=K, init=C_init, max_iter=int(1e6),
tol=1e-20, verbose=0, n_init=1, algorithm=alg)
sklc.fit(X)
result = (np.sum(np.min(np.sum((
np.expand_dims(X, axis=1) -
np.expand_dims(sklc.cluster_centers_, axis=0))**2, axis=2), axis=1)) /
X.shape[0])
print("final E with algorithm ", alg, "\t : \t", result)
Expected
final E with algorithm elkan : 1177.85544176
final E with algorithm full : 1177.85544176
Obtained
final E with algorithm elkan : 1375.74074883
final E with algorithm full : 1177.85544176
Versions
In [11]: import sys; print("Python", sys.version)
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')
In [12]: import numpy; print("NumPy", numpy.__version__)
('NumPy', '1.11.0')
In [13]: import scipy; print("SciPy", scipy.__version__)
('SciPy', '0.19.1')
In [14]: import sklearn; print("Scikit-Learn", sklearn.__version__)
('Scikit-Learn', '0.19.0')
Metadata
Metadata
Assignees
Labels
No labels