SLEP006: CalibratedClassifierCV #24126

BenjaminBossan · 2022-08-05T15:52:45Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

This PR adds metadata routing to CalibratedClassifierCV (CCV). CCV uses
a subestimator to create (out of sample) probabilities, which are in
turn used to calibrate the probabilities.

The metaestimator uses sample_weight. The subestimator may or may not
use sample_weight and additional metadata. So far, it was checked if the
subestimator has sample_weight in its signature and then they were
routed, otherwise not. This is, however, not always ideal, e.g. when the
subestimator is itself a pipeline (#21134). With routing, this problem disappears.

Any other comments?

The majority of the work here was done pair-programming with @adrinjalali.
Therefore, having a fresh set of eyes to review would be appreciated.

In addition to these changes, the tests in
test_metaestimator_metadata_routing.py have been amended to make them
more generic, as right now, they are specific to multioutput.

A current limitation of the generic tests is that check_recorded_metadata cannot
be performed for CCV. The reason is that CCV internally creates a slice of the
metadata before passing them to the subestimator. So exact equality fails in this
case. The possibility was discussed to check for exact equality or for the passed
data being a subset; this would work in this case but not in others, e.g. when sample
weights are normalized. Therefore, the solution for now is that in the tests, it can
be declared that this specific metaestimator opts out of check_recorded_metadata.

@adrinjalali I still don't use the exact values in "warns_on", please let me know
how to use them exactly. I thought it's easier to discuss this with the code out.

This PR adds metadata routing to CalibratedClassifierCV (CCV). CCV uses a subestimator to create (out of sample) probabilities, which are in turn used to calibrate the probabilities. The metaestimator uses sample_weight. The subestimator may or may not use sample_weight and additional metadata. So far, it was checked if the subestimator has sample_weight in its signature and then they were routed, otherwise not. This is, however, not always ideal, e.g. when the subestimator is itself a pipeline (scikit-learn#21134). With routing, this problem disappears. In addition to these changes, the tests in test_metaestimator_metadata_routing.py have been amended to make them more generic, as right now, they are specific to multioutput.

BenjaminBossan · 2022-08-05T16:17:31Z

Linting errors seem to be unrelated to this PR

thomasjpfan · 2022-08-05T18:22:33Z

Syncing with main should help. The linting error was recently fixed in #24065

BenjaminBossan · 2022-08-08T09:20:57Z

Syncing with main should help. The linting error was recently fixed in #24065

I think since this PR is against sample-props, the sample-props branch needs to be synced first, right?

adrinjalali · 2022-08-08T09:21:33Z

Yeah I'll sync that branch with main today.

adrinjalali · 2022-08-08T10:00:03Z

sample-props branch is now synced with main.

BenjaminBossan · 2022-08-08T10:37:40Z

I updated but the linting still fails for unrelated reasons. It also fails on the sample-props branch itself because of what looks like changes introduced by the multioutput PR:

https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=45465&view=logs&j=32e2e1bb-a28f-5b18-6cfc-3f01273f5609&t=8a54543f-0728-5134-6642-bedd98e03dd0

I think this PR can thus be reviewed despite the linting errors.

adrinjalali

Thanks @BenjaminBossan , this looks great!

adrinjalali · 2022-08-08T12:28:33Z

sklearn/calibration.py

@@ -259,6 +259,31 @@ def __init__(
        self.ensemble = ensemble
        self.base_estimator = base_estimator

+    def _get_estimator(self):


note to other reviewers: this is only a refactoring. Used in fit and get_metadata_routing.

adrinjalali · 2022-08-08T12:39:43Z

sklearn/calibration.py

-            # sample_weight checks
-            fit_parameters = signature(estimator.fit).parameters
-            supports_sw = "sample_weight" in fit_parameters
-            if sample_weight is not None and not supports_sw:
-                estimator_name = type(estimator).__name__
-                warnings.warn(
-                    f"Since {estimator_name} does not appear to accept sample_weight, "
-                    "sample weights will only be used for the calibration itself. This "
-                    "can be caused by a limitation of the current scikit-learn API. "
-                    "See the following issue for more details: "
-                    "https://github.com/scikit-learn/scikit-learn/issues/21134. Be "
-                    "warned that the result of the calibration is likely to be "
-                    "incorrect."
-                )


note: metadata routing removes the need for this warning. The user will get the right warnings / errors if the metadata is not requested properly.

adrinjalali · 2022-08-08T12:40:15Z

sklearn/calibration.py

@@ -380,20 +378,14 @@ def fit(self, X, y, sample_weight=None, **fit_params):
                        test=test,
                        method=self.method,
                        classes=self.classes_,
-                        supports_sw=supports_sw,


note: we don't need this parameter since routing will know what to route and what not.

adrinjalali · 2022-08-08T12:41:11Z

sklearn/calibration.py

+        """
+        router = (
+            MetadataRouter(owner=self.__class__.__name__)
+            .add_self(self)


note: self is added since this CCV is both a consumer and a router. One can do weighted CCV but unweighted fit for the underlying estimator.

adrinjalali · 2022-08-08T12:41:56Z

sklearn/tests/test_calibration.py

+def _weighted(estimator):
+    return estimator.set_fit_request(sample_weight=True)


note: only fit can be weighted in CCV, hence only requesting sample_weight for fit.

sklearn/tests/test_metaestimators_metadata_routing.py

adrinjalali · 2022-08-08T13:04:24Z

sklearn/tests/test_metaestimators_metadata_routing.py

+            method = getattr(instance, method_name)
+            method(X, y, sample_weight=sample_weight, metadata=metadata)


this assumes only fit and partial_fit route things around, but other methods could do the same, like transform, score etc.

Do you mean the assumption is implicitly made because of how the method is called? I.e. we need a more generic way to call the method? If so, what needs to be generic: calling with y, calling with sample_weight?

the way the method is called is fine, but if the method being called is not fit or partial_fit, it'll raise an exception that the estimator has not been fit.

Hmm, I see. I have no really good solution for this that would work in all cases. Just something OTOH:

# before calling method if not "fit" in method_name: instance.fit(X, y) # <= would probably still fail on some transformers ...

yeah that would work I think.

sklearn/tests/test_metaestimators_metadata_routing.py

adrinjalali · 2022-08-08T13:06:22Z

sklearn/utils/_metadata_requests.py

+    validate_keys : bool, default=True
+        Whether to check if the requested parameters fit the actual parameters
+        of the method.


note: this was added so that we could simply add an instance of this descriptor to CheckingClassifier

- Add __copy__ method to Registry - Fix parameter docstrings - Don't repeat metaestimator ids code - Check_recorded_metadata runs for all registered estimators - More fine-grained check for warnings, so as _not_ to error on unrelated warnings

BenjaminBossan

I addressed some of your comments and had some questions on others, please take a look.

sklearn/tests/test_metaestimators_metadata_routing.py

BenjaminBossan · 2022-08-08T14:00:16Z

sklearn/tests/test_metaestimators_metadata_routing.py

+            method = getattr(instance, method_name)
+            method(X, y, sample_weight=sample_weight, metadata=metadata)


Do you mean the assumption is implicitly made because of how the method is called? I.e. we need a more generic way to call the method? If so, what needs to be generic: calling with y, calling with sample_weight?

sklearn/tests/test_metaestimators_metadata_routing.py

Change around structure of the generic test to make more sense. Use the values to check for specific arguments that should be warned on, instead of all arguments at once.

adrinjalali

Nice!

sklearn/tests/test_metadata_routing.py

adrinjalali · 2022-08-10T08:55:09Z

sklearn/tests/test_metadata_routing.py

+    # only check keys whose value was explicitly passed
+    expected_keys = {key for key, val in records.items() if val is not None}
+    assert set(kwargs.keys()) == expected_keys


Thinking more about this, the change is making the test weaker, and it makes it weaker everywhere.

I think a safer option would be to leave this function as is, and change record_metadata to accept a record_none arg, which is True by default, and in our Consumer estimators in this PR we can set that arg to False.

Makes sense, I changed it as you suggested.

adrinjalali · 2022-08-10T08:58:01Z

sklearn/tests/test_metaestimators_metadata_routing.py

+        if method_name in warns_on:
+            # this method is expected to warn, not raise
+            continue


same as the above test, if the method's name is in warns_on, it doesn't mean it always warns, it means it warns only for those attributes which are listed there. So we need to test for the other attributes.

You're right, the error test now works analogously to the warning test.

There is now an option for record_metadata to not store None values from kwargs. This is used in the tests now.

There was no error because it's not being used right now.

Analogous to the warning test, we want to check each argument in isolation for the error case.

The could be methods like "score" that require a fitted metaestimator. Therefore, it is fitted before calling the tested method, except of the tested method is a fitting method. Note that right now, this never happens, so in a way that code path is untested.

sklearn/calibration.py

thomasjpfan · 2022-08-12T19:26:53Z

sklearn/tests/test_metadata_routing.py

+    If record_none is False, kwargs whose values are None are skipped. This is
+    so that checks on keyword arguments whose default was not changed are
+    skipped.


I see that record_metadata and check_recorded_metadata is only used for testing.

It is strange how some test such as test_simple_metadata_routing expects None to be recorded while test in test_metaestimators_metadata_routing.py expects them not to be recorded.

Can we assume that None is never recorded all the time?

We record and check None to make sure if a metadata is not requested, it is not passed, not even as None.

The tests which don't record None, work because the user is not passing any metadata as None, and we need to ignore them cause the default value is None and explicitly set in those sub-estimator methods.

Alternatively, we could change the default to "default", and let record_metadata know what the default value is, and only ignore that. It might be cleaner.

What do you think about always recording None and have a flag in check_recorded_metadata to switch between the two modes of checking? Concretely:

def check_recorded_metadata(obj, method, strict=True, **kwargs): """Check whether the expected metadata is passed to the object's method.""" records = getattr(obj, "_records", dict()).get(method, dict()) cmp = operator.eq if strict else operator.le assert cmp(set(kwargs.keys()), set(records.keys())) for key, value in kwargs.items(): assert records[key] is value

I kinda prefer the current implementation because it's easier for the test to test exactly what it needs to. What you have here kind of a subset of what the current implementation does, as in, there's no way to test if the router has routed None explicitly or not, and that's something we need to check. I might be missing something here though.

sklearn/tests/test_metaestimators_metadata_routing.py

sklearn/utils/_mocking.py

Specifically, overriding set_fit_request.

adrinjalali

Other than this (#24126 (comment)), which kinda makes the tests more robust, I'm happy with the PR.

sklearn/calibration.py

Give option to not check if the routed metadata are the literal string "default" (instead of checking for None).

adrinjalali · 2022-08-16T15:09:26Z

Why did it not fail when @thomasjpfan pushes? 🤯

adrinjalali

I don't mind the custom group splitter either way. LGTM.

adrinjalali · 2022-08-17T12:06:37Z

sklearn/tests/test_calibration.py

+    class MyGroupKFold(GroupKFold):
+        """Custom Splitter that checks that the values of groups are correct"""
+
+        def split(self, X, y=None, groups=None):
+            assert (groups == split_groups).all()
+            return super().split(X, y=y, groups=groups)


GroupKFold raises if groups is None anyway, you won't need this custom class to test.

This was more about checking that the correct groups data is being passed. If you think it's not necessary, I'd rather remove it to simplify the test.

I don't think we have to check for the correctness of the groups. Those mechanisms are tested elsewhere.

I removed the custom class.

Could you or someone else please push an empty commit for CI? It seems like CircleCI hasn't solved the issue yet.

sklearn/tests/test_metadata_routing.py

This is already covererd by the general routing tests.

thomasjpfan

LGTM

github-actions bot added the module:utils label Aug 5, 2022

Fix black complaints

0d62945

Merge branch 'sample-props' into slep006/calibratedclassifiercv

3599e16

adrinjalali added the No Changelog Needed label Aug 8, 2022

adrinjalali reviewed Aug 8, 2022

View reviewed changes

Address reviewer comments by Adrin

55488ca

- Add __copy__ method to Registry - Fix parameter docstrings - Don't repeat metaestimator ids code - Check_recorded_metadata runs for all registered estimators - More fine-grained check for warnings, so as _not_ to error on unrelated warnings

BenjaminBossan commented Aug 8, 2022

View reviewed changes

Address reviewer comment: warns_on

f81389a

Change around structure of the generic test to make more sense. Use the values to check for specific arguments that should be warned on, instead of all arguments at once.

adrinjalali reviewed Aug 10, 2022

View reviewed changes

BenjaminBossan added 5 commits August 10, 2022 11:55

Address reviewer comment: recording None

d7680f1

There is now an option for record_metadata to not store None values from kwargs. This is used in the tests now.

__copy__ has no arguments

aceb4b4

There was no error because it's not being used right now.

For expected errors, test each arg seperately

d024a18

Analogous to the warning test, we want to check each argument in isolation for the error case.

Merge branch 'sample-props' into slep006/calibratedclassifiercv

4bf98e9

thomasjpfan reviewed Aug 12, 2022

View reviewed changes

BenjaminBossan added 3 commits August 15, 2022 12:22

Ignore type checking on CheckingClassifier

a353671

Specifically, overriding set_fit_request.

Add routing of groups to splitter

d75b368

Black formatting

c3a1d67

adrinjalali reviewed Aug 16, 2022

View reviewed changes

sklearn/calibration.py Show resolved Hide resolved

BenjaminBossan and others added 4 commits August 16, 2022 16:14

Fix typo

ce3dabd

Reviewer request: Change checking of routed data

ad1b062

Give option to not check if the routed metadata are the literal string "default" (instead of checking for None).

Empty-Commit

fdd7a89

CI empty-commit

b8d459d

BenjaminBossan added 3 commits August 16, 2022 17:15

Yet another empty commit to try if CI works

62867d9

New empty commit to try if CI works now

57b292b

Empty commit after creating CircleCI acct

98f9904

adrinjalali mentioned this pull request Aug 17, 2022

SLEP006 - Metadata Routing task list #22893

Open

28 tasks

adrinjalali approved these changes Aug 17, 2022

View reviewed changes

thomasjpfan reviewed Aug 17, 2022

View reviewed changes

sklearn/tests/test_metadata_routing.py Show resolved Hide resolved

BenjaminBossan and others added 2 commits August 17, 2022 15:23

Don't explicitly check groups values in test

0a54eb8

This is already covererd by the general routing tests.

trigger ci

8ae3475

thomasjpfan approved these changes Aug 17, 2022

View reviewed changes

thomasjpfan merged commit 0afaa63 into scikit-learn:sample-props Aug 17, 2022

BenjaminBossan deleted the slep006/calibratedclassifiercv branch August 18, 2022 07:43

adrinjalali mentioned this pull request Aug 18, 2022

CalibratedClassifierCV does not handle well sample_weight when ensemble=False #20610

Closed

		def _weighted(estimator):
		return estimator.set_fit_request(sample_weight=True)

		method = getattr(instance, method_name)
		method(X, y, sample_weight=sample_weight, metadata=metadata)

Uh oh!

SLEP006: CalibratedClassifierCV #24126

SLEP006: CalibratedClassifierCV #24126

Uh oh!

Conversation

BenjaminBossan commented Aug 5, 2022

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

BenjaminBossan commented Aug 5, 2022

Uh oh!

thomasjpfan commented Aug 5, 2022

Uh oh!

BenjaminBossan commented Aug 8, 2022

Uh oh!

adrinjalali commented Aug 8, 2022

Uh oh!

adrinjalali commented Aug 8, 2022

Uh oh!

BenjaminBossan commented Aug 8, 2022

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

BenjaminBossan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment