-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX add metadata routing to CV splitters in RidgeCV
and RidgeClassifierCV
#29634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre @OmarManzoor I think we can finalize / merge this one, and then open a PR to fix the issues we've been discussing on this thread: #27560 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @StefanieSenger
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @OmarManzoor and @ogrisel. I have adressed your suggestions in my newest commits.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @StefanieSenger
@glemaitre Since you self requested a review I'll let you merge 😄 |
Thanks @OmarManzoor I'll quickly check. We might end-up with a working |
if self.scoring is None: | ||
scorer = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should be missing something. Why we would not combine the above call in the same statement?
if self.scoring is not None:
scorer = self._get_scorer()
else:
scorer = None
Or is it because we want still to set the score_request but in which case I would expect
scorer = self._get_scorer()
if self.scoring is None:
scorer = None
I would probably add a small comment on the top to explain we do this procedure to avoid someone reverting to the first version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into this. I am not sure if I understand what you mean, @glemaitre.
I will try to explain what I meant:
We have to call self._get_scorer()
in any case (even if scorer=None
, because in this case we need to use RidgeCV's score()
method via Passthrougscorer
.
So the code needs to be like this:
scorer = self._get_scorer()
...
if self.scoring is None:
scorer = None
The scorer needs to be defined (scorer = self._get_scorer()
) pretty early in fit()
, because it is used in process_routing()
already if metadata routing is enabled and cv is None.
But then below we need to reset scorer
to None if the original user-inted was to not pass a scoring:
if self.scoring is None:
scorer = None
If we would not do that, we would get an AttributeError: '_IdentityRegressor' object has no attribute 'score'
because the scorer is already something else than None and we would pass something else than None into _RidgeGCV()
and we go down the wrong path of the code.
Basically, we are misusing the variable scorer
by having it have too many meanings at the same time. But maybe that's better than messing with the conventions?
I have added a comment to explain what I mean. Does it all make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes completely. I missed that we were going to get the combination of _PassthroughScorer
+ _IdentityRegressor
that end-up in a non-compatible scikit-learn estimator.
I think we can merge as-is. Maybe one clean-up could be to check that we get an _Identity...
estimator that follow the scikit-learn API but I'm not sure that we can end-up with something working.
Thanks @StefanieSenger I think that we get closer to have a working |
…ifierCV` (scikit-learn#29634) Co-authored-by: adrinjalali <adrin.jalali@gmail.com>
What does this implement/fix? Explain your changes.
This PR fixes a bug by adding metadata routing to CV splitters in
RidgeCV
andRidgeClassifierCV
.It was not discovered before by the tests, because
a) it only happens with grouped splitters such as
GroupKFold
b) there are some automatisms at play when routing
groups
(for instance we don't need to request it for grouped splitters) as long as it's not crossing several layers of nestedness and the bug appears when we useRidgeCV
andRidgeClassifierCV
nested in another router (likecross_validate
).Edit: While working on the test to this (in
test_metaestimators_metadata_routing.py
), a second bug became visible and it was fixed by addingscorer.set_score_request(sample_weight=True)
infit()
for the case the default scoring was used (because here the user doesn't have the chance to set it themselves). The second bug-fix also got a test added intest_ridge.py
.----- rest of this text describes a situation that is outdated now ----
I have tried to add a (somewhat clumsy) test, that tests that the fix works that also tests all the other estimators with internal splitting/CV for the same issue.
But alas, the test also uncovers an unrelated issue with
RidgeClassifierCV
, since theset_score_request
syntax doesn't seem to work here andRidgeClassifierCV
still raises and wants me to set a request that I have already set.minimal reproducible for the original bug:
ValueError: The 'groups' parameter should not be None.
Traceback
The new bug you can see in the CI.
CC @adrinjalali, your expertise will be of great help. :)