FIX `RecursionError` bug with metadata routing in metaestimators with scoring #28712

StefanieSenger · 2024-03-27T15:05:36Z

I found a bug that causes a RecursionError whenever RidgeCV or RidgeClassifierCV are routing metadata without defining the scoring init param (so it defaults to None).

I wrote a fix for that: adding a condition in _BaseRidgeCV._get_scorer() now results in _get_scorer to return None (instead of entering a recursive loop via creating an new _PassthroughScorer in check_scoring()). This was the behaviour before metadata routing was introduced, so I think this is what we want in this case.

This was not tested for yet and I have added a test.

I have also improved the documentation a bit along the way by adding a link.

…efault scoring

github-actions · 2024-03-27T15:06:58Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 2ea8b32. Link to the linter CI: here}

OmarManzoor

LGTM. Thank you for handling this fix @StefanieSenger

adrinjalali

This is not the right solution. The issue is that _PassthroughScorer returns what's returned by the estimator, and that's never cached. I'm looking at the code to see what we can do to fix it.

adrinjalali · 2024-04-02T05:42:07Z

sklearn/linear_model/tests/test_ridge.py

+    X = np.array([[0, 1], [2, 2], [4, 6], [9, 0], [2, 4]])
+    y = [1, 2, 3, 4, 5]
+
+    pipe = SimplePipeline(
+        [
+            ConsumingTransformer()
+            .set_fit_request(sample_weight=True)
+            .set_transform_request(sample_weight=True),
+            ConsumingTransformer()
+            .set_fit_request(sample_weight=True)
+            .set_transform_request(sample_weight=True),
+            metaestimator().set_fit_request(sample_weight=True),
+        ]
+    )
+
+    params = {"sample_weight": [1, 1, 1, 1, 1]}
+
+    pipe.fit(X, y, **params)


The actual minimal reproducible would be:

metaestimator().get_metadata_routing()

So we can remove the rest.

Yes, this is another (and more basic) way to trigger the error.

sklearn/linear_model/_ridge.py

StefanieSenger

Hm, I think we could also allow None to be returned before creating the _PassthroughScorer in check_scoring. Even though I like the privious solution more, because is preserves the behaviour before metadata routing was introduced.

def _get_scorer(self):
    if self.scoring is not None:
        return check_scoring(self, scoring=self.scoring, allow_none=True)

will return None (as before) when self.scoring is None.

What do you think @adrinjalali and @OmarManzoor

StefanieSenger · 2024-04-09T10:33:22Z

sklearn/linear_model/tests/test_ridge.py

@@ -48,7 +48,6 @@
    ignore_warnings,
 )
 from sklearn.utils.fixes import (
-    _IS_32BIT,


I wonder why after my last push, this recent addition isn't imported from fixes anymore... Ruff moved it up.

Okay, that not surprisingly raised. I moved it back to where it was and ruff seems to be content with it. 🤷

adrinjalali · 2024-04-09T12:47:47Z

Another issue with this solution as is, is that it replaces existing scorer request with an empty one, and this scorer happens to support sample_weight.

StefanieSenger · 2024-04-10T07:56:03Z

Another issue with this solution as is, is that it replaces existing scorer request with an empty one, and this scorer happens to support sample_weight.

Yes, true. I will revert this then, since the other patch did at least keep check_scoring intact and also imitated the old behaviour before metadata routing was introduced (with returning None).

StefanieSenger · 2024-04-18T17:36:49Z

About the pickling error:

With _PassthroughScorer now inheriting from _MetadataRequester, many metaestimators now have estimator.scorer_.set_score_request, which makes them unpicklable.

check_estimators_pickle thus fails.

I think that no estimator would have been pickable since metadata routing was introduced, so I wonder how come that these metaestimators have not failed this check before. I also wonder why only a few metaestimators are tested for this check? Do we simply exclude estimators, that would not pass this test?

@adrinjalali, can you help here?

…arn into routing_bug

StefanieSenger · 2024-04-19T08:23:21Z

The current solution fails on test_PassthroughScorer_metadata_request , because now _PassthrouScorer doesn't pass through all the routed metadata anymore, but only its own.

I am not sure if this is doing any harm though.

adrinjalali

I've pushed a commit with the fix. Making sure set_score_attribute is a method instead of an attribute, which cannot be pickled.

adrinjalali · 2024-04-19T07:25:27Z

sklearn/metrics/_scorer.py

+        if hasattr(estimator, "set_score_request"):
+            self.set_score_request = estimator.set_score_request
+
+        requests = get_routing_for_object(self._estimator)


this is exactly the culprit causing the recursion error that we're trying to avoid.

Yes, I have seen it and changed it back.

adrinjalali · 2024-04-19T07:25:59Z

sklearn/metrics/_scorer.py

    def __init__(self, estimator):
        self._estimator = estimator
+        if hasattr(estimator, "set_score_request"):
+            self.set_score_request = estimator.set_score_request


this is making a descriptor (a class attribute which is not pickled) an instance attribute which needs to be pickled which cannot since it returns a function.

So the set_score_request method builds some specific data for each instance which can be pickled.

But for me it seems that estimator.set_score_request has been an instance attribute/method before already, only belonging to estimator, instead of to the scorer instance. Edit: okay they are not.

adrinjalali · 2024-04-19T08:24:52Z

doesn't pass through all the routed metadata anymore, but only its own.

That's expected now, you can fix the test.

adrinjalali

LGTM.

@OmarManzoor wanna have another look?

OmarManzoor

LGTM. I think there are two instances of the added lines not being covered by tests. Can we add them?

adrinjalali · 2024-04-19T11:24:31Z

For testing the new code, I think we can basically check that check_scoring(estimator, None) returns something where we can both do set_score_request and get_metadata_routing is correct.

StefanieSenger · 2024-04-19T11:48:13Z

For testing the new code, I think we can basically check that check_scoring(estimator, None) returns something where we can both do set_score_request and get_metadata_routing is correct.

Thanks, @adrinjalali. But I'm not sure: Do you meant to do the set_score_request on an estimator, then pass it through check_scoring(estimator, None) and then do a similar check as in test_PassthroughScorer_metadata_request with something like this:

    assert_request_equal(
        scorer.get_metadata_routing(),
        {"score": {"sample_weight": "alias"}},
    )

Is this a check for correct routing?

There is also _BaseScorer.set_score_request(), which is very similar code and I would be curious to find out how this is tested. It seems I don't really know how to look for it.

StefanieSenger · 2024-04-19T15:20:37Z

I have tried do write a test for the new set_score_request method. I have to admit that I am not sure if scorer.get_metadata_routing() and scorer._metadata_request ever could have different values and if this a valid test.

Do you like to have a look, @OmarManzoor or @adrinjalali?

adrinjalali · 2024-04-22T09:08:24Z

sklearn/metrics/tests/test_score_objects.py

+@pytest.mark.usefixtures("enable_slep006")
+def test_PassthroughScorer_set_score_request():
+    """Test that _PassthroughScorer.set_score_request adds the correct metadata request
+    on itself."""
+    meta_est = GridSearchCV(estimator=LinearSVC(), param_grid={"C": [0.1, 1]})
+
+    # make a `_PassthroughScorer` with `check_scoring`:
+    scorer = check_scoring(meta_est, None)
+    scorer.set_score_request(sample_weight=True)
+
+    assert str(scorer.get_metadata_routing()) == str(scorer._metadata_request)


This could be made more minimal:

--- a/sklearn/metrics/tests/test_score_objects.py +++ b/sklearn/metrics/tests/test_score_objects.py @@ -1297,13 +1297,11 @@ def test_PassthroughScorer_metadata_request(): def test_PassthroughScorer_set_score_request(): """Test that _PassthroughScorer.set_score_request adds the correct metadata request on itself.""" - meta_est = GridSearchCV(estimator=LinearSVC(), param_grid={"C": [0.1, 1]}) - # make a `_PassthroughScorer` with `check_scoring`: - scorer = check_scoring(meta_est, None) - scorer.set_score_request(sample_weight=True) + scorer = check_scoring(LogisticRegression(), None) + scorer.set_score_request(sample_weight='my_weights') - assert str(scorer.get_metadata_routing()) == str(scorer._metadata_request) + assert scorer.get_metadata_routing().score.requests['sample_weight'] == 'my_weights'

Thanks @adrinjalali, but I find this very confusing. What is the behaviour we want _PassthroughScorer to have and WHY?

Consider this test, where I have tried to unify this new test with the one right above:

"""Test that _PassthroughScorer.set_score_request adds the correct metadata request on itself.""" # make a `_PassthroughScorer` with `check_scoring`: scorer = check_scoring(LogisticRegression().set_score_request(sample_weight="alias"), None) scorer.set_score_request(sample_weight='my_weights') assert scorer.get_metadata_routing().score.requests['sample_weight'] == 'my_weights'

When we would not do the scorer.set_score_request(sample_weight='my_weights'), then we would want to assert that sample_weight == "alias". At least this is what is done in the test above. Obviously, _PassthroughScorers functionality has changed and maybe the test above is not valid anymore. But what are we trying to archive here?

First, we had detected a RecursionError on some (Stacking* is now also affected) metaestimators, then we fixed this by making _PassthroughScorer less passy-through, but here we are testing on normal consumers (LogisticRegression) and anything that has a score method.

When an estimator uses the default scoring, _PassthroughScorer should actually not change routed metadata, but if we do a set_score_request on it, it definitely changes routed metadata and we like that and want to test if it really does??

Is this rather a by-product, that will never be used, and we only maintain to keep codecov happy?

You raise a good point there about carrying the request from the original estimator. So the more complete test would be:

est = LogisticRegression().set_score_request(sample_weight="alias") scorer = check_scoring(est, None) assert scorer.get_metadata_routing().score.requests['sample_weight'] == 'alias' scorer.set_score_request(sample_weight='my_weights') assert scorer.get_metadata_routing().score.requests['sample_weight'] == 'my_weights' # making sure changing the passthrough object doesn't affect the estimator. assert est.get_metadata_routing().score.requests['sample_weight'] == 'alias'

This is about having a correct public API, it's not about just making codecov happy.

Note that we're also making sure we have a non-regression test for the original issue. This is another test to make sure the added functionality behaves correctly.

Okay, I think I now see what we want. Let me sum up:

We want a default scorer (_PassthroughScorer), that adopts the routing set on the estimator it scores for, but that doesn't force the estimator to change its own routing.
Since the default scorer is now newly inheriting from _MetadataRequester it now newly acts as a real consumer (while before this was only as if) and we can set its own routings. This is part of the public API, because check_scoring is.

Unfortunately however, _PassthroughScorer does change its estimators routing. The last assert fails.

I will try to find out why.

adrinjalali · 2024-04-22T15:42:37Z

sklearn/metrics/_scorer.py

    def __init__(self, estimator):
        self._estimator = estimator

+        requests = MetadataRequest(owner=self._estimator.__class__.__name__)
+        try:
+            requests.score = estimator._metadata_request.score


This will fix the issue

Suggested change

requests.score = estimator._metadata_request.score

requests.score = deepcopy(estimator._metadata_request.score)

Wow, yes. So simple.

… messages

adrinjalali

This looks good to me. Another look @OmarManzoor ?

StefanieSenger added 2 commits March 27, 2024 15:49

fix bug with metadata routing in RidgeCV and RidgeClassifierCV with d…

b973d9b

…efault scoring

add test description

338a6c2

github-actions bot added module:linear_model module:metrics labels Mar 27, 2024

StefanieSenger mentioned this pull request Mar 27, 2024

FEA metadata routing for StackingClassifier and StackingRegressor #28701

Merged

changelog

479f5ac

OmarManzoor approved these changes Apr 1, 2024

View reviewed changes

adrinjalali reviewed Apr 2, 2024

View reviewed changes

StefanieSenger and others added 3 commits April 8, 2024 17:45

rst reference links

4937f20

Merge branch 'main' into routing_bug

a444b24

allowing None before making _PassthroughScorer

7654629

StefanieSenger commented Apr 9, 2024

View reviewed changes

move _IS_32BIT import to correct place

4a16e14

StefanieSenger added 4 commits April 10, 2024 09:58

revert to first patch

7edbc34

adrins code

d5a05a5

needing neste try except blocks to catch different AttributeErrors

24bf722

first get routing for estimator, then add scorer routing to it

fcd9647

StefanieSenger changed the title ~~FIX RecursionError bug with metadata routing in RidgeCV and RidgeClassifierCV~~ FIX RecursionError bug with metadata routing in metaestimators with scoring Apr 18, 2024

StefanieSenger and others added 3 commits April 19, 2024 10:09

requests must not access _estimators routing (revert previous change)

6f9ab56

make set_score_request a method instead of an attribute

627031e

Merge branch 'routing_bug' of ssh.github.com:StefanieSenger/scikit-le…

7295318

…arn into routing_bug

adrinjalali reviewed Apr 19, 2024

View reviewed changes

adjust test

0295112

adrinjalali approved these changes Apr 19, 2024

View reviewed changes

OmarManzoor approved these changes Apr 19, 2024

View reviewed changes

add test

def37db

adrinjalali reviewed Apr 22, 2024

View reviewed changes

StefanieSenger added 2 commits April 22, 2024 12:35

modify test after review

5fbbacb

unify tests

04585ab

adrinjalali reviewed Apr 22, 2024

View reviewed changes

StefanieSenger added 4 commits April 22, 2024 18:38

get owners right

ac14f50

deepcopy

50594d5

fix changelog

e63bc64

MethodMetadataRequest owners need to stay as before for correct error…

2ea8b32

… messages

adrinjalali approved these changes Apr 23, 2024

View reviewed changes

OmarManzoor merged commit 78675d1 into scikit-learn:main Apr 23, 2024

StefanieSenger deleted the routing_bug branch April 23, 2024 11:56

	requests.score = estimator._metadata_request.score
	requests.score = deepcopy(estimator._metadata_request.score)

Uh oh!

FIX RecursionError bug with metadata routing in metaestimators with scoring #28712

FIX RecursionError bug with metadata routing in metaestimators with scoring #28712

Uh oh!

Conversation

StefanieSenger commented Mar 27, 2024

Uh oh!

github-actions bot commented Mar 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

StefanieSenger left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Apr 9, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Apr 9, 2024

Uh oh!

StefanieSenger commented Apr 10, 2024

Uh oh!

StefanieSenger commented Apr 18, 2024

Uh oh!

StefanieSenger commented Apr 19, 2024

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Apr 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Apr 19, 2024

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

OmarManzoor left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Apr 19, 2024

Uh oh!

StefanieSenger commented Apr 19, 2024

Uh oh!

StefanieSenger commented Apr 19, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

StefanieSenger Apr 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

FIX `RecursionError` bug with metadata routing in metaestimators with scoring #28712

FIX `RecursionError` bug with metadata routing in metaestimators with scoring #28712

github-actions bot commented Mar 27, 2024 •

edited

Loading

StefanieSenger left a comment •

edited

Loading

StefanieSenger Apr 9, 2024 •

edited

Loading

StefanieSenger Apr 19, 2024 •

edited

Loading

StefanieSenger Apr 22, 2024 •

edited

Loading