Skip to content

[Undocumented?] KMeans behavior change between v1.2.2 and v1.3.0 #30643

Closed as not planned
@stu-blair

Description

@stu-blair

Describe the bug

When upgrading scikit-learn from 1.1.1 to 1.6.0, I noticed my scripts results were completely changing, despite using the same code, data, and random seeds. I narrowed this down to v1.3.0. I tried setting every parameter I could find, but still the results are different. KMeans produces clusters in different orders before and after 1.3.0 release. The results are deterministic if you set the random seed, in both 1.2.2 and 1.3.0, but the ordering changes upon upgrade.

Can someone please help me identify what change causes this, and if there is any way to get the post-1.3.0 behavior to be consistent with the prior behavior?

Steps/Code to Reproduce

I ran the following code in both v1.2.2 and v1.3.0

import sklearn
from sklearn.cluster import KMeans
def do_kmeans_test():
  data = [[ 2.614], [ 2.614], [-0.493], [-0.489], [-0.489], [ 4.965], [ 2.518], [ 0.518], [ 0.201]]
  kmeans_pca = KMeans(n_clusters=3,init='k-means++',random_state=1234,n_init=10,max_iter=300,verbose=False,tol=1e-4,copy_x=True,algorithm="lloyd")
  kmeans_pca.fit(data)
  print('sklearn version:', sklearn.__version__)
  print('cluster centers:', kmeans_pca.cluster_centers_)

do_kmeans_test()

Expected Results

In v1.2.2 (and earlier versions 1.1.1 and 1.2.0) the results are:

sklearn version: 1.2.2
cluster centers: [[-0.1504]
 [ 4.965 ]
 [ 2.582 ]]

Actual Results

But in v1.3.0 (and later versions) the results are:

sklearn version: 1.3.0
cluster centers: [[ 2.582 ]
 [-0.1504]
 [ 4.965 ]]

Versions

in the 1.2.2 test:

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.8.12 (default, Jan 10 2025, 11:02:07)  [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/sblair/.pyenv/versions/3.8.12/bin/python
   machine: macOS-15.1-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.2.2
          pip: 21.1.1
   setuptools: 56.0.0
        numpy: 1.19.5
        scipy: 1.10.1
       Cython: None
       pandas: 1.4.0
   matplotlib: 3.7.5
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libomp
       filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/numpy/.dylibs/libopenblas.0.dylib
        version: 0.3.13
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell



in the 1.3.0 test:

>>> import sklearn; sklearn.show_versions()

System:
    python: 3.8.12 (default, Jan 10 2025, 11:02:07)  [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/sblair/.pyenv/versions/3.8.12/bin/python
   machine: macOS-15.1-x86_64-i386-64bit

Python dependencies:
      sklearn: 1.3.0
          pip: 21.1.1
   setuptools: 56.0.0
        numpy: 1.19.5
        scipy: 1.10.1
       Cython: None
       pandas: 1.4.0
   matplotlib: 3.7.5
       joblib: 1.4.2
threadpoolctl: 3.5.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 16
         prefix: libomp
       filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/numpy/.dylibs/libopenblas.0.dylib
        version: 0.3.13
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.18
threading_layer: pthreads
   architecture: Haswell

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions