-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Add metadata routing to RidgeCV and RidgeClassifierCV #27560
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to have a closer look at this.
@adrinjalali I did some refactoring. Could you kindly have a look? |
@adrinjalali Any updates on this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, this was lost in my notifications. LGTM other than the nits.
def _get_scorer(self): | ||
return check_scoring(self, scoring=self.scoring, allow_none=True) | ||
|
||
def _score_without_scorer(self, squared_errors): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a very small docstring for these two methods would be nice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering why don't we pass score_params
to it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So for this case, we are in the case that we can only compute the MSE so there is no score_params
that can be passed because we are in a single case. However, I'm concerned with sample_weight
that is the only params that we could take into account for the GCV scheme:
scikit-learn/sklearn/linear_model/_ridge.py
Lines 2151 to 2156 in 1d22a48
G_inverse_diag, c = solve(float(alpha), y, sqrt_sw, X_mean, *decomposition) | |
if scorer is None: | |
squared_errors = (c / G_inverse_diag) ** 2 | |
alpha_score = self._score_without_scorer(squared_errors=squared_errors) | |
if self.store_cv_results: | |
self.cv_results_[:, i] = squared_errors.ravel() |
So I see that we use sample_weight
to resolve the problem. However, the squared errors are not weighting with sample_weight
. In the case, that we have a solver, I'm under the impression that we would reapply the weights again to the metric. Do I miss something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was the default case that was present when we don't have any scorer. Here we calculate the squared errors simply using a line of code as is shown above. Do we require any params to correctly calculate the alpha_score using the squared errors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here would be such an example:
import numpy as np
import sklearn
from sklearn.linear_model import RidgeCV
from sklearn.metrics import get_scorer
from sklearn.pipeline import make_pipeline
sklearn.set_config(enable_metadata_routing=True)
rng = np.random.default_rng(0)
coef = rng.uniform(size=(10,))
X = rng.normal(size=(1_000, 10))
y = X @ coef
y[-1] = 1_000 # add outlier
sample_weight = np.ones_like(y)
sample_weight[-1] = 0.0
ridge_no_scoring = make_pipeline(
RidgeCV(scoring=None).set_score_request(
sample_weight=True
).set_fit_request(sample_weight=True)
)
ridge_no_scoring.fit(
X, y, sample_weight=sample_weight
)
scorer = get_scorer("neg_mean_squared_error").set_score_request(
sample_weight=True
)
ridge_with_scorer = make_pipeline(
RidgeCV(scoring=scorer).set_score_request(
sample_weight=True
).set_fit_request(sample_weight=True)
)
ridge_with_scorer.fit(
X, y, sample_weight=sample_weight
)
print(
f"Ridge with no scoring: {ridge_no_scoring[-1].best_score_}\n"
f"Ridge with a scorer: {ridge_with_scorer[-1].best_score_}"
)
np.testing.assert_allclose(
ridge_no_scoring[-1].coef_, ridge_with_scorer[-1].coef_
)
Ridge with no scoring: -4.0321032019694904e-08
Ridge with a scorer: -4.036139341310801e-08
I would have expected a much bigger difference if we would have not take into account the weight. So I'm wondering if this is not even something more subtle at the end. Here, we could increase the value of the outlier and we see that the error does not increase at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So looking closer, the following line is used to generate the predictions (as per written in http://cbcl.mit.edu/publications/ps/MIT-CSAIL-TR-2007-025.pdf)
predictions = y - (c / G_inverse_diag)
However, if we use the prediction function from the RidgeCV
, we would have something equivalent to:
X @ safe_sparse_dot(c.T, X)
And the results are slightly different. I have to check more in details what is the reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So forget what ever I said before and @StefanieSenger was right from the start, this is just an issue of the sample_weight
. So if we want to have the same call than with a scorer, we would need the following function:
def _score_without_scorer(self, squared_errors, score_params):
"""Performs scoring using squared errors when the scorer is None."""
axis = 0 if self.alpha_per_target else None
weights = score_params.get("sample_weight", None)
return -_average(squared_errors, axis=axis, weights=weights)
Basically mean
will divide by the number of element while _average
will divide by the sum of the sample_weight
.
One thing to notice, if we use this function, it means that we are going to always forward sample_weight
to the score function. So we don't have the possibility to have the same as get_scorer("neg_mean_squared_error").set_score_request(sample_weight=False)
in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am confused about the premises to the discussion, which boils down to this question if sample_weight
should be used consistently with metadata routing enabled only or also with it disabled.
When we enable metadata routing, do we want to mimic the old behaviour with it (meaning the different scores should re-appear) or would we route sample_weight
even if it's not passed otherwise?
It seems that the desired result of the tests was to have the same scores if metadata routing is enabled, and accepting that it's not the same without the routing enabled, but then the discussion switched to talking about fixing the inconsistency of the original code?
Sorry, I have the impression that this is all glass clear to everyone but myself. I don't want to re-open any closed question, but only to retrospectively try to follow the discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we got a small sync talk with @StefanieSenger but here is a summary of whatever happen in this thread :). Posting it here to make sure that the thread become clearer.
Yep the discussion went sideways. Sorry about that. I used this thread as a debugging thread :).
So to summarize, I'm not to worry about the consistency between disabling/enabling metadata routing because we might not reach the proper behaviour without metadata routing and meta-estimator. So here, what is really important is to have the right behaviour for when enabling metadata-routing. It means:
- when
scoring is None
, we need to make sure thatsample_weight
is taken into account. Basically, this is not configurable by the user since it does not pass a scorer. - so the behaviour of
scoring is None
is equivalent than passing a scorer using the mean squared error and setting the score parameter to accept sample weight. We should be able to test this equivalence.
For me this is the behaviour that we should make sure is right. So it results in adding sample_weight
or score_params
to the _score_without_scorer
.
@agramfort in case you have some bandwidth for this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a couple of nitpicks. Otherwise LGTM.
Thanks @OmarManzoor All good on my side. |
Reference Issues/PRs
Towards: #22893
What does this implement/fix? Explain your changes.
Any other comments?
CC: @adrinjalali @glemaitre