Skip to content

KMeans : "elkan" vs "full" discrepency  #9795

Closed
@newling

Description

@newling

Description

There is an inconsistency between "full" and "elkan" in K-Means. I believe the discrepency arises when empty centers are reset, in which case the bound update here might need modification.

Steps/Code to Reproduce

import numpy as np

from sklearn.cluster import KMeans

seed = 51220
rng = np.random.RandomState(seed)
N = 1200
K = 100
X = rng.randn(N, 2)**7
indices_init = np.arange(K, dtype=np.uint64)
C_init = X[indices_init]
for alg in ["elkan", "full"]:
    sklc = KMeans(n_clusters=K, init=C_init, max_iter=int(1e6),
                  tol=1e-20, verbose=0, n_init=1, algorithm=alg)
    sklc.fit(X)
    result = (np.sum(np.min(np.sum((
        np.expand_dims(X, axis=1) -
        np.expand_dims(sklc.cluster_centers_, axis=0))**2, axis=2), axis=1)) /
              X.shape[0])
    print("final E with algorithm ", alg, "\t : \t", result)

Expected

final E with algorithm elkan : 1177.85544176
final E with algorithm full : 1177.85544176

Obtained

final E with algorithm elkan : 1375.74074883
final E with algorithm full : 1177.85544176

Versions

In [11]: import sys; print("Python", sys.version)
('Python', '2.7.12 (default, Nov 19 2016, 06:48:10) \n[GCC 5.4.0 20160609]')

In [12]: import numpy; print("NumPy", numpy.__version__)
('NumPy', '1.11.0')

In [13]: import scipy; print("SciPy", scipy.__version__)
('SciPy', '0.19.1')

In [14]: import sklearn; print("Scikit-Learn", sklearn.__version__)
('Scikit-Learn', '0.19.0')

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions