You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We're a team working on a clustering problem. To make the result reproducible, we've set the random_state in KMean() to 0. However, we got a different result when we run the same code with the same data a week later. We're unsure about why it happened and still can't get the previous result.
Are you using the same version of scikit-learn between the two runs.
We fixed a couple of things in 1.0.1 that solve some reproducibility issues: #21195
But it means that between versions, the result will change.
I didn't pay much attention to the version of sklearn in the first run, but I've run the code again using version 1.0 and it outputs the same result as version 1.0.1.
The issue we had happened around Nov 17 which I believe is after version 1.0.1 is released (correct me if I am wrong).
Describe the bug
Hello,
We're a team working on a clustering problem. To make the result reproducible, we've set the random_state in KMean() to 0. However, we got a different result when we run the same code with the same data a week later. We're unsure about why it happened and still can't get the previous result.
Steps/Code to Reproduce
Expected Results
Expected number of observations in each cluster:
Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4
212 | 65 | 242 | 312
Actual Results
Now:
Cluster 1 | Cluster 2 | Cluster 3 | Cluster 4
321 | 67 | 192 | 251
Versions
System:
python: 3.7.12 (default, Sep 10 2021, 00:21:48) [GCC 7.5.0]
executable: /usr/bin/python3
machine: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Python dependencies:
pip: 21.1.3
setuptools: 57.4.0
sklearn: 1.0.1
numpy: 1.19.5
scipy: 1.4.1
Cython: 0.29.24
pandas: 1.1.5
matplotlib: 3.2.2
joblib: 1.1.0
threadpoolctl: 3.0.0
Built with OpenMP: True
The text was updated successfully, but these errors were encountered: