FIX Elkan k-means does not stop if tol=0 #16075

kno10 · 2020-01-09T16:16:07Z

K-means convergence is still different between "full" k-means and "elkan" k-means.

The fix in #15831 is incomplete. Compare:

Lines 441 to 442 in 4fe4d27

    
           center_shift_total = squared_norm(centers_old - centers) 
        
           if center_shift_total <= tol:

and

scikit-learn/sklearn/cluster/_k_means_elkan.pyx

Line 233 in 4fe4d27

center_shift = np.sqrt(np.sum((centers_ - new_centers) ** 2, axis=1))

scikit-learn/sklearn/cluster/_k_means_elkan.pyx

Lines 249 to 250 in 4fe4d27

    
           center_shift_total = np.sum(center_shift ** 2) 
        
           if center_shift_total < tol:

it should be noted that in the second version, it would likely make sense to first use squared_norm, and then separate the square root, rather than taking the square of the rooted values below.
But in this PR I'm just pointing to a single character. One tests <= and the other tests <.
With tol=0 this means that "full" may stop when the clusters stop moving, while with elkan it never stops then, but always takes all iterations.

I do not think this is the best stopping criterion. If a numerical issue arises in computing the center shifts, this may cause the algorithm to always take the maximum number of iterations. The classic termination criterion for k-means is different: stop if no object is relabeled. That is more reliable.

jeremiedbb

lgtm. Please add a what's new entry.

kno10 · 2020-01-14T12:22:30Z

Extended a unit test that failed before my change (300 vs. 7 iterations; simply by adding tol=0 to the existing test case), and a changelog entry.

rth

Could you please add an empty commit to re-trigger CI (git commit --allow-empty)?
The existing failing build fails to load so I don't know if it could be related. Otherwise LGTM.

kno10 · 2020-02-05T22:00:27Z

Re-triggered CI, checks passed.

rth · 2020-02-05T23:27:49Z

Thanks @kno10 !

jeremiedbb approved these changes Jan 14, 2020

View reviewed changes

kno10 added 2 commits January 14, 2020 13:17

add test for tol=0

35e344b

FIX: inconsistency in Elkan termination criterion

7c22fdf

kno10 force-pushed the patch-kmeans-convergence branch from 981e933 to 7c22fdf Compare January 14, 2020 12:20

rth approved these changes Feb 5, 2020

View reviewed changes

re-trigger CI

9e956e6

rth changed the title ~~FIX: Elkan k-means does not stop if tol=0 ("full" does)~~ FIX Elkan k-means does not stop if tol=0 Feb 5, 2020

rth merged commit 91261c2 into scikit-learn:master Feb 5, 2020

kno10 deleted the patch-kmeans-convergence branch February 6, 2020 00:03

NicolasHug mentioned this pull request Feb 13, 2020

[MRG] new K-means implementation for improved performances #11950

Merged

6 tasks

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Feb 22, 2020

FIX Elkan k-means does not stop if tol=0 (scikit-learn#16075)

a262d49

panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020

FIX Elkan k-means does not stop if tol=0 (scikit-learn#16075)

bd8cc24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

FIX Elkan k-means does not stop if tol=0 #16075

FIX Elkan k-means does not stop if tol=0 #16075

Uh oh!

kno10 commented Jan 9, 2020

Uh oh!

jeremiedbb left a comment

Uh oh!

kno10 commented Jan 14, 2020

Uh oh!

rth left a comment

Uh oh!

kno10 commented Feb 5, 2020

Uh oh!

rth commented Feb 5, 2020

Uh oh!

Uh oh!

	center_shift_total = squared_norm(centers_old - centers)
	if center_shift_total <= tol:

	center_shift_total = np.sum(center_shift ** 2)
	if center_shift_total < tol:

Uh oh!

FIX Elkan k-means does not stop if tol=0 #16075

FIX Elkan k-means does not stop if tol=0 #16075

Uh oh!

Conversation

kno10 commented Jan 9, 2020

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

kno10 commented Jan 14, 2020

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

kno10 commented Feb 5, 2020

Uh oh!

rth commented Feb 5, 2020

Uh oh!

Uh oh!