Description
Describe the bug
When upgrading scikit-learn from 1.1.1 to 1.6.0, I noticed my scripts results were completely changing, despite using the same code, data, and random seeds. I narrowed this down to v1.3.0. I tried setting every parameter I could find, but still the results are different. KMeans
produces clusters in different orders before and after 1.3.0 release. The results are deterministic if you set the random seed, in both 1.2.2 and 1.3.0, but the ordering changes upon upgrade.
Can someone please help me identify what change causes this, and if there is any way to get the post-1.3.0 behavior to be consistent with the prior behavior?
Steps/Code to Reproduce
I ran the following code in both v1.2.2 and v1.3.0
import sklearn
from sklearn.cluster import KMeans
def do_kmeans_test():
data = [[ 2.614], [ 2.614], [-0.493], [-0.489], [-0.489], [ 4.965], [ 2.518], [ 0.518], [ 0.201]]
kmeans_pca = KMeans(n_clusters=3,init='k-means++',random_state=1234,n_init=10,max_iter=300,verbose=False,tol=1e-4,copy_x=True,algorithm="lloyd")
kmeans_pca.fit(data)
print('sklearn version:', sklearn.__version__)
print('cluster centers:', kmeans_pca.cluster_centers_)
do_kmeans_test()
Expected Results
In v1.2.2 (and earlier versions 1.1.1 and 1.2.0) the results are:
sklearn version: 1.2.2
cluster centers: [[-0.1504]
[ 4.965 ]
[ 2.582 ]]
Actual Results
But in v1.3.0 (and later versions) the results are:
sklearn version: 1.3.0
cluster centers: [[ 2.582 ]
[-0.1504]
[ 4.965 ]]
Versions
in the 1.2.2 test:
>>> import sklearn; sklearn.show_versions()
System:
python: 3.8.12 (default, Jan 10 2025, 11:02:07) [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/sblair/.pyenv/versions/3.8.12/bin/python
machine: macOS-15.1-x86_64-i386-64bit
Python dependencies:
sklearn: 1.2.2
pip: 21.1.1
setuptools: 56.0.0
numpy: 1.19.5
scipy: 1.10.1
Cython: None
pandas: 1.4.0
matplotlib: 3.7.5
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 16
prefix: libomp
filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/sklearn/.dylibs/libomp.dylib
version: None
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/numpy/.dylibs/libopenblas.0.dylib
version: 0.3.13
threading_layer: pthreads
architecture: Haswell
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/scipy/.dylibs/libopenblas.0.dylib
version: 0.3.18
threading_layer: pthreads
architecture: Haswell
in the 1.3.0 test:
>>> import sklearn; sklearn.show_versions()
System:
python: 3.8.12 (default, Jan 10 2025, 11:02:07) [Clang 16.0.0 (clang-1600.0.26.4)]
executable: /Users/sblair/.pyenv/versions/3.8.12/bin/python
machine: macOS-15.1-x86_64-i386-64bit
Python dependencies:
sklearn: 1.3.0
pip: 21.1.1
setuptools: 56.0.0
numpy: 1.19.5
scipy: 1.10.1
Cython: None
pandas: 1.4.0
matplotlib: 3.7.5
joblib: 1.4.2
threadpoolctl: 3.5.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
num_threads: 16
prefix: libomp
filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/sklearn/.dylibs/libomp.dylib
version: None
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/numpy/.dylibs/libopenblas.0.dylib
version: 0.3.13
threading_layer: pthreads
architecture: Haswell
user_api: blas
internal_api: openblas
num_threads: 8
prefix: libopenblas
filepath: /Users/sblair/.pyenv/versions/3.8.12/lib/python3.8/site-packages/scipy/.dylibs/libopenblas.0.dylib
version: 0.3.18
threading_layer: pthreads
architecture: Haswell