Skip to content

TST Relax test_minibatch_sensible_reassign to avoid CI failures with single global random seed #29278

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Jun 20, 2024

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jun 17, 2024

Close #29253.

Summarising my comments below, relaxing the check to be > 9 instead of > 10 makes the CI pass on all random seeds, see Azure logs on PR commit 162684a. I have not been able to reproduce the issue locally.

This issue has been seen in multiple CI builds from time to time e.g. #27967 (comment) or #26802 (comment)

@lesteve lesteve marked this pull request as draft June 17, 2024 11:56
Copy link

github-actions bot commented Jun 17, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 3abcd1c. Link to the linter CI: here

@lesteve
Copy link
Member Author

lesteve commented Jun 17, 2024

Some observations

  • for one build there is the info about the number of clusters which is right on the boundary so using > 9 may be a quick fix
E       AssertionError: assert 10 > 10
E        +  where 10 = <built-in method sum of numpy.ndarray object at 0x7fda357b5090>()
E        +    where <built-in method sum of numpy.ndarray object at 0x7fda357b5090> = array([ True, False,  True,  True, False, False, False,  True,  True,\n        True, False, False,  True, False, False, False, False,  True,\n        True,  True]).sum
E        +      where array([ True, False,  True,  True, False, False, False,  True,  True,\n        True, False, False,  True, False, False, False, False,  True,\n        True,  True]) = <built-in method any of numpy.ndarray object at 0x7fda488d9e10>(axis=1)
E        +        where <built-in method any of numpy.ndarray object at 0x7fda488d9e10> = array([[ -0.08411287,  -1.35341192],\n       [  0.        ,   0.        ],\n       [-10.04085907,  10.04083876],\n       ...     ],\n       [ -9.1038364 ,   6.86739558],\n       [ -9.66539459,   5.01011406],\n       [ -6.894048  ,  -1.75933572]]).any
E        +          where array([[ -0.08411287,  -1.35341192],\n       [  0.        ,   0.        ],\n       [-10.04085907,  10.04083876],\n       ...     ],\n       [ -9.1038364 ,   6.86739558],\n       [ -9.66539459,   5.01011406],\n       [ -6.894048  ,  -1.75933572]]) = MiniBatchKMeans(batch_size=10, init='random', n_clusters=20, random_state=34).cluster_centers_
  • for some builds sometimes it passes sometimes it does not, for example Ubuntu_Jammy_Jellyfish pymin_conda_forge_openblas_ubuntu_2204, Haswell passes build log, SkylakeX fails build log

@lesteve
Copy link
Member Author

lesteve commented Jun 17, 2024

For all the failing builds the number of clusters is 10, so using > 9 may be a quick fix:

ValueError: Number of non-zero clusters is too small num_non_zero_clusters=10
  • Linux_Runs pylatest_conda_forge_mkl Details
  • Linux_free_threaded pylatest_pip_free_threaded Details
  • Ubuntu_Jammy_Jellyfish pymin_conda_forge_openblas_ubuntu_2204 Details

lesteve added 2 commits June 18, 2024 06:09
test_minibatch_sensible_reassign
@lesteve
Copy link
Member Author

lesteve commented Jun 18, 2024

I tested all the random seeds in the CI and the only problematic seed on all four failing builds is global_random_seed=34 with num_non_zero_clusters=10, see Azure logs

free-threaded fails with Haskell (still can't reproduce locally though with same CPU architecture ...), the two other failing builds using OpenBLAS are using SkyLakeX, the MKL one we don't know (sklearn.show_versions() does not give the CPU architecture info)

@lesteve lesteve marked this pull request as ready for review June 18, 2024 05:13
@lesteve
Copy link
Member Author

lesteve commented Jun 18, 2024

Relaxing the number of clusters check to be > 9 rather than > 10, all random seed CI pass see Azure logs, I am going to trigger the normal CI and make this as ready for review.

@lesteve lesteve changed the title NOMRG Test minibatch sensible reassign TST Relax test_minibatch_sensible_reassign to avoid CI failures with single global random seed Jun 18, 2024
@@ -437,21 +437,24 @@ def test_minibatch_sensible_reassign(global_random_seed):
n_clusters=20, batch_size=10, random_state=global_random_seed, init="random"
).fit(zeroed_X)
# there should not be too many exact zero cluster centers
assert km.cluster_centers_.any(axis=1).sum() > 10
num_non_zero_clusters = km.cluster_centers_.any(axis=1).sum()
assert num_non_zero_clusters > 9, f"{num_non_zero_clusters=} is too small"
Copy link
Member Author

@lesteve lesteve Jun 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added assertion string (i.e. second argument in the assert, not sure if there is a more exact name) because it seems like pytest assertion rewriting is a bit broken (needs a bit of investigation as to why). If this ever fails again, at least the message will tell how far we are from the threshold

@lesteve lesteve added No Changelog Needed Quick Review For PRs that are quick to review labels Jun 18, 2024
Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Ideally the test should be reworked with assertion where we know in advance what we really expect, but let's go with this quick tweak for now.

@jeremiedbb jeremiedbb merged commit a4f4efe into scikit-learn:main Jun 20, 2024
41 of 47 checks passed
@lesteve lesteve deleted the test_minibatch_sensible_reassign branch June 20, 2024 12:57
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
jeremiedbb pushed a commit that referenced this pull request Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

⚠️ CI failed on Linux_free_threaded.pylatest_pip_free_threaded (last failure: Jun 14, 2024) ⚠️
2 participants