Skip to content

FEA Add metadata routing to RidgeCV and RidgeClassifierCV #27560

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Mar 5, 2024

Conversation

OmarManzoor
Copy link
Contributor

Reference Issues/PRs

Towards: #22893

What does this implement/fix? Explain your changes.

  • Adds metadata routing to RidgeCV and RidgeClassifierCV

Any other comments?

CC: @adrinjalali @glemaitre

@github-actions
Copy link

github-actions bot commented Oct 10, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 513b936. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to have a closer look at this.

@glemaitre glemaitre self-requested a review October 23, 2023 11:11
@adrinjalali adrinjalali self-requested a review October 26, 2023 11:24
@adrinjalali adrinjalali self-requested a review December 4, 2023 11:56
@glemaitre glemaitre self-requested a review January 9, 2024 15:07
@OmarManzoor
Copy link
Contributor Author

@adrinjalali I did some refactoring. Could you kindly have a look?

@OmarManzoor
Copy link
Contributor Author

@adrinjalali Any updates on this PR?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, this was lost in my notifications. LGTM other than the nits.

def _get_scorer(self):
return check_scoring(self, scoring=self.scoring, allow_none=True)

def _score_without_scorer(self, squared_errors):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a very small docstring for these two methods would be nice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering why don't we pass score_params to it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So for this case, we are in the case that we can only compute the MSE so there is no score_params that can be passed because we are in a single case. However, I'm concerned with sample_weight that is the only params that we could take into account for the GCV scheme:

G_inverse_diag, c = solve(float(alpha), y, sqrt_sw, X_mean, *decomposition)
if scorer is None:
squared_errors = (c / G_inverse_diag) ** 2
alpha_score = self._score_without_scorer(squared_errors=squared_errors)
if self.store_cv_results:
self.cv_results_[:, i] = squared_errors.ravel()

So I see that we use sample_weight to resolve the problem. However, the squared errors are not weighting with sample_weight. In the case, that we have a solver, I'm under the impression that we would reapply the weights again to the metric. Do I miss something?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was the default case that was present when we don't have any scorer. Here we calculate the squared errors simply using a line of code as is shown above. Do we require any params to correctly calculate the alpha_score using the squared errors?

Copy link
Member

@glemaitre glemaitre Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here would be such an example:

import numpy as np
import sklearn
from sklearn.linear_model import RidgeCV
from sklearn.metrics import get_scorer
from sklearn.pipeline import make_pipeline

sklearn.set_config(enable_metadata_routing=True)

rng = np.random.default_rng(0)
coef = rng.uniform(size=(10,))
X = rng.normal(size=(1_000, 10))
y = X @ coef
y[-1] = 1_000  # add outlier
sample_weight = np.ones_like(y)
sample_weight[-1] = 0.0

ridge_no_scoring = make_pipeline(
    RidgeCV(scoring=None).set_score_request(
        sample_weight=True
    ).set_fit_request(sample_weight=True)
)
ridge_no_scoring.fit(
    X, y, sample_weight=sample_weight
)
scorer = get_scorer("neg_mean_squared_error").set_score_request(
    sample_weight=True
)
ridge_with_scorer = make_pipeline(
    RidgeCV(scoring=scorer).set_score_request(
        sample_weight=True
    ).set_fit_request(sample_weight=True)
)
ridge_with_scorer.fit(
    X, y, sample_weight=sample_weight
)

print(
    f"Ridge with no scoring: {ridge_no_scoring[-1].best_score_}\n"
    f"Ridge with a scorer: {ridge_with_scorer[-1].best_score_}"
)
np.testing.assert_allclose(
    ridge_no_scoring[-1].coef_, ridge_with_scorer[-1].coef_
)
Ridge with no scoring: -4.0321032019694904e-08
Ridge with a scorer: -4.036139341310801e-08

I would have expected a much bigger difference if we would have not take into account the weight. So I'm wondering if this is not even something more subtle at the end. Here, we could increase the value of the outlier and we see that the error does not increase at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So looking closer, the following line is used to generate the predictions (as per written in http://cbcl.mit.edu/publications/ps/MIT-CSAIL-TR-2007-025.pdf)

predictions = y - (c / G_inverse_diag)

However, if we use the prediction function from the RidgeCV, we would have something equivalent to:

X @ safe_sparse_dot(c.T, X)

And the results are slightly different. I have to check more in details what is the reason.

Copy link
Member

@glemaitre glemaitre Sep 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So forget what ever I said before and @StefanieSenger was right from the start, this is just an issue of the sample_weight. So if we want to have the same call than with a scorer, we would need the following function:

    def _score_without_scorer(self, squared_errors, score_params):
        """Performs scoring using squared errors when the scorer is None."""
        axis = 0 if self.alpha_per_target else None
        weights = score_params.get("sample_weight", None)
        return -_average(squared_errors, axis=axis, weights=weights)

Basically mean will divide by the number of element while _average will divide by the sum of the sample_weight.

One thing to notice, if we use this function, it means that we are going to always forward sample_weight to the score function. So we don't have the possibility to have the same as get_scorer("neg_mean_squared_error").set_score_request(sample_weight=False) in this case.

Copy link
Contributor

@StefanieSenger StefanieSenger Sep 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am confused about the premises to the discussion, which boils down to this question if sample_weight should be used consistently with metadata routing enabled only or also with it disabled.

When we enable metadata routing, do we want to mimic the old behaviour with it (meaning the different scores should re-appear) or would we route sample_weight even if it's not passed otherwise?

It seems that the desired result of the tests was to have the same scores if metadata routing is enabled, and accepting that it's not the same without the routing enabled, but then the discussion switched to talking about fixing the inconsistency of the original code?

Sorry, I have the impression that this is all glass clear to everyone but myself. I don't want to re-open any closed question, but only to retrospectively try to follow the discussion.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we got a small sync talk with @StefanieSenger but here is a summary of whatever happen in this thread :). Posting it here to make sure that the thread become clearer.

Yep the discussion went sideways. Sorry about that. I used this thread as a debugging thread :).

So to summarize, I'm not to worry about the consistency between disabling/enabling metadata routing because we might not reach the proper behaviour without metadata routing and meta-estimator. So here, what is really important is to have the right behaviour for when enabling metadata-routing. It means:

  • when scoring is None, we need to make sure that sample_weight is taken into account. Basically, this is not configurable by the user since it does not pass a scorer.
  • so the behaviour of scoring is None is equivalent than passing a scorer using the mean squared error and setting the score parameter to accept sample weight. We should be able to test this equivalence.

For me this is the behaviour that we should make sure is right. So it results in adding sample_weight or score_params to the _score_without_scorer.

@adrinjalali
Copy link
Member

@agramfort in case you have some bandwidth for this.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of nitpicks. Otherwise LGTM.

@glemaitre glemaitre merged commit ae7c3bd into scikit-learn:main Mar 5, 2024
@glemaitre
Copy link
Member

Thanks @OmarManzoor All good on my side.
Merging this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants