Skip to content

Make check_sample_weights_invariance cv-aware #29796

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Sep 6, 2024

Conversation

antoinebaker
Copy link
Contributor

Handling of CV estimators in check_sample_weight_invariance following
#16298 (comment)

Copy link

github-actions bot commented Sep 6, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a447c70. Link to the linter CI: here

@antoinebaker antoinebaker marked this pull request as draft September 6, 2024 08:49
@ogrisel ogrisel changed the title Check sample weight Make check_sample_weights_invariance cv-aware Sep 6, 2024
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @antoinebaker for the PR. I believe this is already a net improvement. Could you please remove the "Draft" status?

I am +1 for merging as is and then follow up with extra PRs to:

  • split check_sample_weights_invariance into two methods:
  • Investigate if we can fix the behavior of GridSearchCV and friends even when metadata-routing is disabled;
  • Find a way to run the check_unit_sample_weight and check_sample_weights_invariance checks on all meta-estimators, including GridSearchCV (& *SearchCV) and some canonical pipelines such as the ones defined in test_metaestimators.py. We might wait for the refactoring by @adrinjalali and @glemaitre to progress before embarking on those though.
  • Think about how we can run those checks with and without metadata routing enabled;
  • Make sure that the proper behavior works when metadata routing is enabled (related to SLEP006: default routing #26179).

@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2024

/cc @jeremiedbb.

@antoinebaker antoinebaker marked this pull request as ready for review September 6, 2024 12:54
@jeremiedbb
Copy link
Member

I think we can remove the specific test for CalibratedClassifierCV

def test_calibrated_classifier_cv_zeros_sample_weights_equivalence(method, ensemble):

now that it's properly tested in the common tests.

a more thorough check_sample_weights_invariance that implements the generic case with weight vs repetition equivalence with random integer weights between 0 and 3

When we do that, we will also be able to remove this test

def test_calibrated_classifier_cv_double_sample_weights_equivalence(method, ensemble):

@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2024

I think we can remove the specific test for CalibratedClassifierCV

Indeed. I checked test_calibrated_classifier_cv_zeros_sample_weights_equivalence and it only checks the match of the underlying coef_ attribute and the outputs of a call to predict_proba. Our common tests only tests the latter but since there is a very direct relationship between the two, I think it's ok to remove test_calibrated_classifier_cv_zeros_sample_weights_equivalence.

For LogisticRegressionCV however, the new/updated test in #29419 is stronger because it tests with a non-default value for the Cs=100 hyperparameter (instead of 10 by default) which makes it possible to more finely detect sample_weight related bugs. Furthermore it also make assertions on estimator specific attributes that would make it easier to debug problems in case of regression.

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks

@jeremiedbb
Copy link
Member

do we wait to merge #29442 before this one or we trust the quick check we did during the live debbuging session ?

@ogrisel ogrisel enabled auto-merge (squash) September 6, 2024 14:25
@ogrisel
Copy link
Member

ogrisel commented Sep 6, 2024

Let's merge as is and iterate to limit dependencies between PRs.

@ogrisel ogrisel merged commit 0b0b90b into scikit-learn:main Sep 6, 2024
37 of 41 checks passed
@adrinjalali
Copy link
Member

Find a way to run the check_unit_sample_weight and check_sample_weights_invariance checks on all meta-estimators, including GridSearchCV (& *SearchCV) and some canonical pipelines such as the ones defined in test_metaestimators.py. We might wait for the refactoring by @adrinjalali and @glemaitre to progress before embarking on those though.

@ogrisel with the new instance_generator.py already merged and improved, I think the only estimator not tested is SparseCoder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants