Make check_sample_weights_invariance cv-aware #29796

antoinebaker · 2024-09-06T08:46:45Z

Handling of CV estimators in check_sample_weight_invariance following
#16298 (comment)

github-actions · 2024-09-06T08:48:02Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: a447c70. Link to the linter CI: here}

ogrisel

Thanks @antoinebaker for the PR. I believe this is already a net improvement. Could you please remove the "Draft" status?

I am +1 for merging as is and then follow up with extra PRs to:

split check_sample_weights_invariance into two methods:
- a simple check_unit_sample_weight (basically the kind="ones" case of the current implementation;
- a more thorough check_sample_weights_invariance that implements the generic case with weight vs repetition equivalence with random integer weights between 0 and 3 similarly to the testing strategy implemented in #29419 (and discussed in #15657 (comment)). This test will need to be made cv aware.
Investigate if we can fix the behavior of GridSearchCV and friends even when metadata-routing is disabled;
Find a way to run the check_unit_sample_weight and check_sample_weights_invariance checks on all meta-estimators, including GridSearchCV (& *SearchCV) and some canonical pipelines such as the ones defined in test_metaestimators.py. We might wait for the refactoring by @adrinjalali and @glemaitre to progress before embarking on those though.
Think about how we can run those checks with and without metadata routing enabled;
Make sure that the proper behavior works when metadata routing is enabled (related to #26179).

ogrisel · 2024-09-06T12:39:15Z

/cc @jeremiedbb.

jeremiedbb · 2024-09-06T13:08:18Z

I think we can remove the specific test for CalibratedClassifierCV

scikit-learn/sklearn/tests/test_calibration.py

Line 946 in 7baa11e

    
           def test_calibrated_classifier_cv_zeros_sample_weights_equivalence(method, ensemble):

now that it's properly tested in the common tests.

a more thorough check_sample_weights_invariance that implements the generic case with weight vs repetition equivalence with random integer weights between 0 and 3

When we do that, we will also be able to remove this test

scikit-learn/sklearn/tests/test_calibration.py

Line 837 in 7baa11e

    
           def test_calibrated_classifier_cv_double_sample_weights_equivalence(method, ensemble):

ogrisel · 2024-09-06T13:26:58Z

I think we can remove the specific test for CalibratedClassifierCV

Indeed. I checked test_calibrated_classifier_cv_zeros_sample_weights_equivalence and it only checks the match of the underlying coef_ attribute and the outputs of a call to predict_proba. Our common tests only tests the latter but since there is a very direct relationship between the two, I think it's ok to remove test_calibrated_classifier_cv_zeros_sample_weights_equivalence.

For LogisticRegressionCV however, the new/updated test in #29419 is stronger because it tests with a non-default value for the Cs=100 hyperparameter (instead of 10 by default) which makes it possible to more finely detect sample_weight related bugs. Furthermore it also make assertions on estimator specific attributes that would make it easier to debug problems in case of regression.

jeremiedbb

LGTM, thanks

jeremiedbb · 2024-09-06T14:12:41Z

do we wait to merge #29442 before this one or we trust the quick check we did during the live debbuging session ?

ogrisel · 2024-09-06T14:26:15Z

Let's merge as is and iterate to limit dependencies between PRs.

adrinjalali · 2024-09-07T12:39:02Z

Find a way to run the check_unit_sample_weight and check_sample_weights_invariance checks on all meta-estimators, including GridSearchCV (& *SearchCV) and some canonical pipelines such as the ones defined in test_metaestimators.py. We might wait for the refactoring by @adrinjalali and @glemaitre to progress before embarking on those though.

@ogrisel with the new instance_generator.py already merged and improved, I think the only estimator not tested is SparseCoder

antoinebaker added 2 commits September 6, 2024 10:30

add cv handling

269f212

Merge remote-tracking branch 'upstream/main' into check_sample_weight

d96d7a9

github-actions bot added the module:utils label Sep 6, 2024

antoinebaker marked this pull request as draft September 6, 2024 08:49

antoinebaker added 2 commits September 6, 2024 11:47

change sklearn tags

cab1e93

fix linter

6ab0301

ogrisel changed the title ~~Check sample weight~~ Make check_sample_weights_invariance cv-aware Sep 6, 2024

ogrisel approved these changes Sep 6, 2024

View reviewed changes

antoinebaker marked this pull request as ready for review September 6, 2024 12:54

remove test_calibrated_classifier_cv_zeros_sample_weights_equivalence

a447c70

ogrisel added Waiting for Reviewer Waiting for Second Reviewer First reviewer is done, need a second one! and removed Waiting for Reviewer labels Sep 6, 2024

jeremiedbb added the No Changelog Needed label Sep 6, 2024

jeremiedbb approved these changes Sep 6, 2024

View reviewed changes

ogrisel enabled auto-merge (squash) September 6, 2024 14:25

ogrisel merged commit 0b0b90b into scikit-learn:main Sep 6, 2024
37 of 41 checks passed

antoinebaker mentioned this pull request Sep 9, 2024

Refactor check_sample_weights_invariance into a more general repetition/reweighting equivalence check #29818

Merged

1 task

jeremiedbb mentioned this pull request Sep 13, 2024

List of estimators with known incorrect handling of sample_weight #16298

Open

54 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Make check_sample_weights_invariance cv-aware #29796

Make check_sample_weights_invariance cv-aware #29796

antoinebaker commented Sep 6, 2024

Uh oh!

github-actions bot commented Sep 6, 2024 •

edited

Loading

Uh oh!

ogrisel left a comment •

edited

Loading

Uh oh!

ogrisel commented Sep 6, 2024

Uh oh!

jeremiedbb commented Sep 6, 2024

Uh oh!

ogrisel commented Sep 6, 2024 •

edited

Loading

Uh oh!

jeremiedbb left a comment

Uh oh!

jeremiedbb commented Sep 6, 2024

Uh oh!

ogrisel commented Sep 6, 2024

Uh oh!

Uh oh!

adrinjalali commented Sep 7, 2024

Uh oh!

Uh oh!

Uh oh!

Make check_sample_weights_invariance cv-aware #29796

Make check_sample_weights_invariance cv-aware #29796

Conversation

antoinebaker commented Sep 6, 2024

Uh oh!

github-actions bot commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Sep 6, 2024

Uh oh!

jeremiedbb commented Sep 6, 2024

Uh oh!

ogrisel commented Sep 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb commented Sep 6, 2024

Uh oh!

ogrisel commented Sep 6, 2024

Uh oh!

Uh oh!

adrinjalali commented Sep 7, 2024

Uh oh!

Uh oh!

github-actions bot commented Sep 6, 2024 •

edited

Loading

ogrisel left a comment •

edited

Loading

ogrisel commented Sep 6, 2024 •

edited

Loading