Base sample-prop implementation and docs (alternative to #21284) #22083

adrinjalali · 2021-12-27T17:12:19Z

This is an alternative to #21284.

The main motivation behind the changes is what @glemaitre noticed as repetition of some of the routing logic between get_metadata_request methods and what would go in fit, transform, etc.

To review this PR, you probably should start with the plot_metadata_routing file.

This PR simplifies the developer API substantially, such that a simple pipeline can be implemented as:

class SimplePipeline(ClassifierMixin, BaseEstimator):
    _required_parameters = ["estimator"]

    def __init__(self, transformer, classifier):
        self.transformer = transformer
        self.classifier = classifier

    def fit(self, X, y, **fit_params):
        params = process_routing(self, "fit", fit_params)

        self.transformer_ = clone(self.transformer).fit(X, y, **params.transformer.fit)
        X_transformed = self.transformer_.transform(X, **params.transformer.transform)

        self.classifier_ = clone(self.classifier).fit(
            X_transformed, y, **params.classifier.fit
        )
        return self

    def predict(self, X, **predict_params):
        params = process_routing(self, "predict", predict_params)

        X_transformed = self.transformer_.transform(X, **params.transformer.transform)
        return self.classifier_.predict(X_transformed, **params.classifier.predict)

    def get_metadata_routing(self):
        router = (
            MetadataRouter()
            .add(
                transformer=self.transformer,
                method_mapping=MethodMapping()
                .add(method="fit", used_in="fit")
                .add(method="transform", used_in="fit")
                .add(method="transform", used_in="predict"),
            )
            .add(classifier=self.classifier, method_mapping="one-to-one")
        )
        return router.serialize()

This PR does NOT change the user API compared to #21284.

cc @jnothman @glemaitre @thomasjpfan @agramfort

note from the other PR

This PR is into the sample-props branch and not main. The idea is to break #20350 into smaller PRs for easier review and discussion rounds.

This PR adds the base implementation, and some documentation and a few tests. The tests are re-done from the previous PR. You can probably start with examples/metadata_routing.py to get a sense of how things work, and then check the implementation.

This PR does NOT touch splitters and scorers, those and all meta-estimators will be done in future PRs.

EDIT: process_routing is now a function rather than a decorator.
EDIT: Some updates and summary of current discussions: #22083 (comment)
EDIT: a timeline of this feature being merged into main is written under scikit-learn/enhancement_proposals#65 (comment)
EDIT: this PR creates set_{method}_request instead of {method}_requests now.

…s-base

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

…rn into sample-props-base

sklearn/utils/metadata_requests.py

…s-base-alternate2

adrinjalali · 2022-02-28T15:00:21Z

@jnothman per your suggestion I changed the get_routing_for_object to raise an error instead of returning an empty request if it doesn't recognize the object. Now I remember the reason the function wasn't raising an error was that I didn't want to break people's code if they used a third party estimator which doesn't implement get_metadata_routing (yet). What's your take on that?

jnothman · 2022-02-28T23:41:11Z

Now I remember the reason the function wasn't raising an error was that I didn't want to break people's code if they used a third party estimator which doesn't implement get_metadata_routing (yet). What's your take on that?

I think you're right that this should not raise an error, such that route_params will return an empty dict for that case. (And this should be tested.)

lorentzenchr

Slowly progressing...

sklearn/utils/_metadata_requests.py

lorentzenchr · 2022-03-01T09:18:43Z

sklearn/utils/_metadata_requests.py

+
+            The default (UNCHANGED) retains the existing request. This allows
+            you to change the request for some parameters and not others.
+


Very good discussion!
I try to summarize:

Handle fit_transform as separate method with its own set_fit_transform_requests.

Merge the requests of fit and transform (error if inconsistent)

Distinguish between fit_transform that only calls .fit(X).transform(X) and the rest (where it does something meaningful).

Even if it's appealing, I don't like the 3rd option. I have a slight preference for the 2nd option: merge the requests.

lorentzenchr · 2022-03-01T09:55:20Z

sklearn/utils/_metadata_requests.py

+    owner : str
+        A display name for the object owning these requests.
+
+    method : str
+        The name of the method to which these requests belong.


Forgive my stupid questions. What's the difference between owner, method and callee, caller?

Not stupid at all. owner is only used to enhance the error message if an error is raised. method is the method for which this MethodMetadataRequest is used to store the request data.

If a meta-estimator is a consumer and a router, e.g. fit (caller) consumes sample_weight but also routes sample_weight to the sub-estimator's fit (callee), then both caller and callee methods will have a MethodMetdataRequest attached to them, with the corresponding method name.

lorentzenchr · 2022-03-01T09:57:40Z

sklearn/utils/_metadata_requests.py

+        self.method = method
+
+    @property
+    def requests(self):


A docstring would be nice, something like "Dictionary of key values pairs: param, alias."

Is it a property without setter for write protection?

Yes, users should use add_request to modify this dict.

lorentzenchr

Finished with class MethodMetadataRequest.

sklearn/utils/_metadata_requests.py

lorentzenchr

class MetadataRequest

sklearn/utils/_metadata_requests.py

lorentzenchr · 2022-03-02T16:07:21Z

sklearn/utils/_metadata_requests.py

+    # this is here for us to use this attribute's value instead of doing
+    # `isinstance`` in our checks, so that we avoid issues when people vendor
+    # this file instad of using it directly from scikit-learn.
+    type = "request"


Seems like an important name. How about metadata_type, routing_type? Just type seems like provoking name clashes.

I meant the attribute name type, not its values like "request". As this attribute is not inherited, e.g. via _MetadataRequester, it seems fine. I was worried about name clashes.

ah I see. Yeah I wouldn't want to have two different names for it since this way I can just check for the value of the variable w/o having to care which instance type it is. I think it should be private anyway, so making it private.

sklearn/utils/_metadata_requests.py

lorentzenchr · 2022-03-02T16:30:56Z

sklearn/utils/_metadata_requests.py

+    # `isinstance`` in our checks, so that we avoid issues when people vendor
+    # this file instad of using it directly from scikit-learn.
+    type = "router"


Same typos and same comment about naming of type as in MetadataRequest.

sklearn/utils/_metadata_requests.py

lorentzenchr · 2022-03-04T12:54:10Z

sklearn/utils/_metadata_requests.py

+        self._self = None
+        self.owner = owner
+
+    def add_self(self, obj):


I found this very confusing at first read. IIUK, it adds obj so _self and _self is an attribute with special treatment/meaning in routing.

Tried adding some clarification here and above.

lorentzenchr

@adrinjalali Really great work!

I would appreciate the cast of a searching 3rd eye (=reviewer) as I might very well have overlooked or not understood something. This is an important change that would deserve such an investment.

sklearn/utils/_metadata_requests.py

sklearn/tests/test_metadata_routing.py

jnothman · 2022-03-09T13:01:54Z

I would appreciate the cast of a searching 3rd eye (=reviewer) as I might very well have overlooked or not understood something. This is an important change that would deserve such an investment.

I am of the opinion that since we have had the SLEP accepted, and this is not being merged to master, we have enough layers of assurance; rather, we will get more helpful review once we're implementing specific routers. @adrinjalali please ping when there are follow-up PRs or issues.

lorentzenchr · 2022-03-10T11:55:17Z

I would appreciate the cast of a searching 3rd eye (=reviewer) as I might very well have overlooked or not understood something. This is an important change that would deserve such an investment.

I am of the opinion that since we have had the SLEP accepted, and this is not being merged to master, we have enough layers of assurance; rather, we will get more helpful review once we're implementing specific routers. @adrinjalali please ping when there are follow-up PRs or issues.

That's totally fine with me. I just expressed "my feelings". Let's merge as soon as the last nitpicks are addressed.

adrinjalali · 2022-03-10T12:08:43Z

Nice. I'll merge when the CI is green (applied the latest nits I think).

I think the only major thing we have here is the fit_transform issue, and will open a separate issue to talk about that specific topic.

jnothman

Can we get a TODO list of what proceeds from here, and where additional contributors can play a part? We haven't made those changes to the glossary, for instance.

* initial base implementation commit * fix test_props and the issue with attribute starting with __ * skip doctest in metadata_routing.rst for now * DOC explain why aliasing on sub-estimator of a consumer/router is useful * reduce diff * DOC add user guide link to method docstrings * DOC apply Thomas's suggestions to the rst file * CLN address a few comments in docs * ignore sentinel docstring check * handling backward compatibility and deprecation prototype * Update examples/plot_metadata_routing.py Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> * make __metadata_request__* format more intuitive and less redundant * metadata_request_factory always returns a copy * fix tests for the changed __metadata_request__* format * in example: foo->sample_weight, bar->groups * get_method_input->get_input * minor comments from Guillaume * fix estimator checks tests * Improved sample props developer API * fixes, updated doc, decorator * Add docstrings and some API cleanup * unify serialize/deserialize methods * Add more docstring to process_routing * fix MetadataRouter.get_params parameter mismatch * DOC add missing name to MethodMetadataRequest.deserialize docstring * DOC add MethodMapping.add docstring * DOC fix colons after versionadded * fix {method}_requests return type annotation * metadata_request_factory -> metadata_router_factory and docstring fixes * move 'me' out of the map in MetadataRouter * more docstring refinements * cleanup API addresses and create a utils.metadata_routing sub-folder * fix module import issue * more tests and a few bug fixes * Joel's comments * make process_routing a function * docstring fix * ^type -> $type * remove deserialize, return instance, and add type as an attribute * remove sentinels and use strings instead * make RequestType searchable and check for valid identifier * Route -> MethodPair * remove unnecessary sorted * clarification on usage of the process_routing func in the example * only print methods with non-empty requests * fix test_string_representations * remove source build cache from CircleCI (temporarily) * Trigger CI * Invalidate linux-arm64 ccache my changing the key * Trigger CI * method, used_in -> callee, caller * show RequestType instead of RequestType.value in _serialize() * more informative error messages * fix checking for conflicting keys * get_router_for_object -> get_routing_for_object * \{method\}_requests -> set_\{method\}_request * address metadata_routing.rst comments * some test enhancements * TypeError for extra arguments * add_request: prop -> param * original_names -> return_alias * add more tests for MetadataRouter and MethodMapping * more suggestions from Joel's review * fix return type * apply more suggestions from Joel's review * Christian\'s suggestions * more notes from Christian * test_get_routing_for_object returns empty requests on unknown objects * more notes from Christian * remove double line break * more notes from Christian * more notes from Christian * make type private * add more comments/docs * fix test * fix nits * add forgotten nit Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

adrinjalali and others added 25 commits October 8, 2021 17:27

initial base implementation commit

dbead5c

fix test_props and the issue with attribute starting with __

7868950

skip doctest in metadata_routing.rst for now

5793318

DOC explain why aliasing on sub-estimator of a consumer/router is useful

6696497

reduce diff

c0841c8

DOC add user guide link to method docstrings

1aff2eb

DOC apply Thomas's suggestions to the rst file

1457293

CLN address a few comments in docs

af86e82

Merge remote-tracking branch 'upstream/sample-props' into sample-prop…

4c228cf

…s-base

ignore sentinel docstring check

11649d9

handling backward compatibility and deprecation prototype

b5c962c

Update examples/plot_metadata_routing.py

fb200e2

Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>

make __metadata_request__* format more intuitive and less redundant

6f849b2

metadata_request_factory always returns a copy

82b2128

Merge remote-tracking branch 'upstream/main' into sample-props-base

6f3f590

fix tests for the changed __metadata_request__* format

16c47b2

in example: foo->sample_weight, bar->groups

1c591fe

get_method_input->get_input

93d448e

minor comments from Guillaume

167e4c2

Merge branch 'sample-props-base' of github.com:adrinjalali/scikit-lea…

3d199ee

…rn into sample-props-base

fix estimator checks tests

20fe48a

Improved sample props developer API

39a462d

fixes, updated doc, decorator

bd5ae36

Add docstrings and some API cleanup

79b43f1

unify serialize/deserialize methods

515d00c

github-actions bot added the module:utils label Dec 27, 2021

adrinjalali added the No Changelog Needed label Dec 27, 2021

adrinjalali commented Dec 27, 2021

View reviewed changes

sklearn/utils/metadata_requests.py Outdated Show resolved Hide resolved

adrinjalali mentioned this pull request Dec 27, 2021

Base sample-prop implementation and docs #21284

Closed

Merge remote-tracking branch 'upstream/sample-props' into sample-prop…

01c942a

…s-base-alternate2

Christian\'s suggestions

c380ad7

more notes from Christian

bae8402

lorentzenchr reviewed Mar 1, 2022

View reviewed changes

adrinjalali added 3 commits March 1, 2022 13:36

test_get_routing_for_object returns empty requests on unknown objects

8ca978b

more notes from Christian

1aed83c

remove double line break

5fda075

lorentzenchr reviewed Mar 2, 2022

View reviewed changes

sklearn/utils/_metadata_requests.py Outdated Show resolved Hide resolved

sklearn/utils/_metadata_requests.py Outdated Show resolved Hide resolved

sklearn/utils/_metadata_requests.py Outdated Show resolved Hide resolved

more notes from Christian

7663cfe

lorentzenchr reviewed Mar 2, 2022

View reviewed changes

more notes from Christian

2f51480

lorentzenchr reviewed Mar 4, 2022

View reviewed changes

adrinjalali added 3 commits March 7, 2022 11:36

make type private

d87cbc2

add more comments/docs

46dccf2

fix test

e5d46e3

lorentzenchr approved these changes Mar 8, 2022

View reviewed changes

fix nits

70f8c6c

adrinjalali added 2 commits March 10, 2022 14:10

add forgotten nit

560a2da

Merge branch 'sample-props' into sample-props-base-alternate2

e932501

adrinjalali merged commit 0b298ed into scikit-learn:sample-props Mar 10, 2022

adrinjalali deleted the sample-props-base-alternate2 branch March 10, 2022 14:39

jnothman reviewed Mar 14, 2022

View reviewed changes

adrinjalali mentioned this pull request Mar 18, 2022

SLEP006 - Metadata Routing task list #22893

Open

28 tasks

lorentzenchr mentioned this pull request Apr 30, 2025

ENH add X_val and y_val to HGBT.fit #27124

Merged

Jacob-Stevens-Haas mentioned this pull request May 16, 2025

Describe set_{method}_request() API, expose _MetadataRequester, or expose _BaseScorer #31360

Open


		The default (UNCHANGED) retains the existing request. This allows
		you to change the request for some parameters and not others.

Uh oh!

Base sample-prop implementation and docs (alternative to #21284) #22083

Base sample-prop implementation and docs (alternative to #21284) #22083

Uh oh!

Conversation

adrinjalali commented Dec 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

note from the other PR

Uh oh!

Uh oh!

adrinjalali commented Feb 28, 2022

Uh oh!

jnothman commented Feb 28, 2022

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lorentzenchr Mar 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lorentzenchr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented Mar 9, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Mar 10, 2022

Uh oh!

adrinjalali commented Mar 10, 2022

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

adrinjalali commented Dec 27, 2021 •

edited

Loading

lorentzenchr Mar 4, 2022 •

edited

Loading

jnothman commented Mar 9, 2022 •

edited

Loading