Skip to content

DOC Clarify metadata routing docs from _metadata_requests.py module #31419

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented May 23, 2025

What does this implement/fix? Explain your changes.

This PR aims to clarify the documentation (internal and public) on the _metadata_requests.py module.

It's also meant to simplify further work on this module for us (me).

This PR entails:

  • for public docstrings:
    • add links to the glossary
    • add more details
  • internal and public documentation:
    • concretised general terms (object, param, prop) with what is meant in this context in method docstrings, when a special term is needed to explain the usecase
      • I went though each term individually and kept object and param in parts of the docstrings where they seem to refer to more technical aspects (objects as class instances, params as something to pass to or accepted by methods)
      • got rid of prop and property as terms completely, since I find this is very ambiguous
      • I've kept all variable names as they are for now (some might profit from being re-named in the future)
  • correct some mistakes

Copy link

github-actions bot commented May 23, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d3cd3c9. Link to the linter CI: here

@StefanieSenger StefanieSenger added the Metadata Routing all issues related to metadata routing, slep006, sample props label May 23, 2025
Comment on lines -1017 to -1018
If the router is also a consumer, it also checks for warnings of
`self`'s/consumer's requested metadata.
Copy link
Contributor Author

@StefanieSenger StefanieSenger May 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have removed the two lines on checking warnings, since this method only does the smaller part of the work of checking for warnings and thus this might not be as relevant here. But it would actually be nice, to do this all in one place. The rest of the warning checking is done in MethodMetadataRequest._route_params().

For another PR, I was thinking of trying to put all the warning checking into here (MetadataRouter.route_params()) and then documenting it with:

Metadata requests are also validated and may trigger warnings for newly added 
metadata. This includes the router’s own requests if it also acts as a consumer.

Would that be a good idea?

@StefanieSenger StefanieSenger marked this pull request as ready for review May 24, 2025 13:01
Comment on lines -332 to +344
Specifies which metadata should be routed to `param`
Specifies which metadata should be routed to the method that owns this
`MethodMetadataRequest`.
- str: the name (or alias) of metadata given to a meta-estimator that
should be routed to this parameter.
should be routed to the method that owns this `MethodMetadataRequest`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was wrong before.

Comment on lines 782 to 793
This class is used by router objects to store and handle metadata routing.
Routing information is stored as a dictionary of the form ``{"object_name":
This class is used by :term:`meta-estimators` or functions that can route metadata
to store and handle their metadata routing. Routing information is stored in a
dictionary-like structure of the form ``{"object_name":
Copy link
Contributor Author

@StefanieSenger StefanieSenger May 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finding different words for "router objects" to prevent confusion with this classes' own name.

Comment on lines +12 to +14
The routing is coordinated by building ``MetadataRequest`` objects
for objects that consume metadata, and ``MetadataRouter`` objects for objects that
can route metadata, which are then aligned during a call to `process_routing()`."
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The part explaining that MetadataRequest objects and MetadataRouter objects are aligned during a call to process_routing() is crucial to communicate the general idea.

Comment on lines -252 to +261
item : object
The given item to be checked if it can be an alias.
item : str
The given name to be checked if it can be an alias for the metadata.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to be a copy-paste from the docstring of request_is_valid and even though a "str" is an "object", it'll help users more to be more specific here. I would also rename the item param into name.

But do request_is_alias and request_is_valid need to be in the public API at all?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still an object here, particularly, anything in VALID_REQUEST_VALUES. Also, this function checks if the given object is a string, so the input doesn't have to be a string.

@StefanieSenger
Copy link
Contributor Author

@adrinjalali, would you like to have a look?

@lucyleeow lucyleeow self-requested a review May 30, 2025 05:22
Copy link
Member

@lucyleeow lucyleeow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a very thoughtful change and a nice improvement! I feel like I understand this stuff better (though still some confusion).

The module docstring _metadata_requests.py is quite long, it maybe nice to just visually break it up by adding a section header (even though this will never be rendered) for the part that talks about MetadataRequest and MetadataRouter.

Additionally, it feels like the last paragraph:

The ``set_{method}_request`` methods are dynamically generated for estimators
which inherit from the ``BaseEstimator``. This is done by attaching instances
of the ``RequestMethod`` descriptor to classes, which is done in the
``_MetadataRequester`` class, and ``BaseEstimator`` inherits from this mixin.
This mixin also implements the ``get_metadata_routing``, which meta-estimators
need to override, but it works for simple consumers as is.

belongs further up, with the section that talks about MetadataRequest.

Another suggestion which doesn't have to be made by you/in this PR:

I feel like the 'alias' part is half explained here:

The ``alias`` above in the ``add`` method has to be either a string (an alias),
or a {True (requested), False (unrequested), None (error if passed)}``. There
are some other special values such as ``UNUSED`` and ``WARN`` which are used
for purposes such as warning of removing a metadata in a child class, but not
used by the end users.

and half explained here (too long to show all of it, but here is the start):

# Each request value needs to be one of the following values, or an alias.

It would be nice to have it in one place. Also it would be nice to more explicitly explain things "error if passed" -> raise error if this parameter is passed (I think this is what is meant?)

can route metadata, which are then aligned during a call to `process_routing()`."
The ``MetadataRequest`` and ``MetadataRouter`` objects are constructed via a
``get_metadata_routing`` method, which all estimators (should) therefore implement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the brackets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was, I think, supposed to express that third party developers and users who implement custom meta-estimators should also implement a get_metadata_routing method, while it is not strictly required to write a scikit-learn compatible estimator.

I have now re-phrased it to only match scikit-learn's native estimators. A guide on how third party developers can use scikit-learns metadata routing API is already existing in the user guide and better explained there as well.

Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reviewing, @lucyleeow!

I went through all your comments and have further refined the docs, especially the module docstring. I agree it is quite long, but I have broken it up with section headings and also re-ordered it a bit.

Now, it might be also exportable / usable in the docs that get rendered (maybe in the example linked below), and I will think about it.

The last paragraph I have left where it is for now, putting "implementation details" as a sub-heading, because it seems not so relevant to understand how to implement a custom estimator that routes metadata, which was the lens I tried to read this through. I can also move it up, if that helps though.

I will keep your comment on the alias in mind. I have not fully wrapped my head around it yet.

can route metadata, which are then aligned during a call to `process_routing()`."
The ``MetadataRequest`` and ``MetadataRouter`` objects are constructed via a
``get_metadata_routing`` method, which all estimators (should) therefore implement.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was, I think, supposed to express that third party developers and users who implement custom meta-estimators should also implement a get_metadata_routing method, while it is not strictly required to write a scikit-learn compatible estimator.

I have now re-phrased it to only match scikit-learn's native estimators. A guide on how third party developers can use scikit-learns metadata routing API is already existing in the user guide and better explained there as well.

Comment on lines 26 to 28
``get_metadata_routing`` returns a ``MetadataRouter`` object. It stores
information on which method from the router object calls which in a consumer's
object, because this is how the metadata is to be passed.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, exactly! It would look something like this:

{'estimator': {'mapping': [{'caller': 'fit', 'callee': 'fit'}, {'caller': 'predict', 'callee': 'predict'}, {'caller': 'score', 'callee': 'score'}], 'router': {'fit': {'sample_weight': True}, 'predict': {'groups': None}, 'score': {'sample_weight': None}}}}

Though, now I am wondering why the second part is called "router" when in other parts of the docs the same concept is referred to as "request"...

Do you think it makes sense to change that, too?

@adrinjalali, would you verify this can be changed? What would be your take on that?

also returns a ``MetadataRouter`` object. Its
routing information includes both information about the object itself (added
via ``MetadataRouter.add_self_request``), as well as the routing information
for its sub-estimators.
A ``MetadataRequest`` instance includes one ``MethodMetadataRequest`` per
method in ``METHODS``, which includes ``fit``, ``score``, etc.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I had spared that out until now. But you are right to ask me to clarify this as well. So I have tried to re-write this paragraph. Please let me know what you think. :)

My understanding is that each consuming estimator communicates their requests by their _get_metadata_request method, which in the simplest case inspects the methods' signatures and builds a dictionary with all the requested metadata from it, which is then stored as an estimator's _metadata_request attribute. This attribute is created while running set_{method}_request via RequestMethod.__get__ (not sure how it gets there though, just inspecting the output of my debugging session).

Something like metadatarequest.fit.add does not exist in the code as far as I can tell. I would assume the syntax has changed and metadatarequest.fit.add(param="sample_weight", alias="my_weights") refers to method_metadata_request.add_request(param=prop, alias=alias), which - yes - is done per method and add became add_request.

Comment on lines +826 to 827
the router itself needs to be included in the routing. The passed object
can be an estimator or a
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my current understanding, it will mostly be an estimator and I wonder more about how this would work / what could happen for this to be another MetadataRequest.

If you look at the example for developers writing their own meta-estimators, there we would add self into .add_self_request() when the MetadataRouter is build:

class RouterConsumerClassifier(MetaEstimatorMixin, ClassifierMixin, BaseEstimator):
    def __init__(self, estimator):
        self.estimator = estimator

    def get_metadata_routing(self):
        router = (
            [MetadataRouter](https://scikit-learn.org/stable/modules/generated/sklearn.utils.metadata_routing.MetadataRouter.html#sklearn.utils.metadata_routing.MetadataRouter)(owner=self.__class__.__name__)
            # defining metadata routing request values for usage in the meta-estimator
            .add_self_request(self)
            # defining metadata routing request values for usage in the sub-estimator
            .add(
                estimator=self.estimator,
                method_mapping=[MethodMapping](https://scikit-learn.org/stable/modules/generated/sklearn.utils.metadata_routing.MethodMapping.html#sklearn.utils.metadata_routing.MethodMapping)()
                .add(caller="fit", callee="fit")
                .add(caller="predict", callee="predict")
                .add(caller="score", callee="score"),
            )
        )
        return router

``MetadataRouter`` includes information about sub-objects' routing and how
methods are mapped together. For instance, the information about which methods
of a sub-estimator are called in which methods of the meta-estimator are all
stored here. Conceptually, this information looks like:
Copy link
Contributor Author

@StefanieSenger StefanieSenger Jun 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoiding the word "store" for MetadataRouter, since it gets re-build on demand and is not attached to any object.

Comment on lines +826 to 827
the router itself needs to be included in the routing. The passed object
can be an estimator or a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay that makes more sense, and combined with your explanation in #31419 (comment) I understand better.
The MetadataRequest use case is indeed intriguing. I can imagine 'how' it could work but not sure in what situation it would be used.
Anyway thanks!

Comment on lines 1174 to 1193
REQUESTER_DOC = """ Request metadata passed to the ``{method}`` method.

Note that this method is only relevant if
``enable_metadata_routing=True`` (see :func:`sklearn.set_config`).
Please see :ref:`User Guide <metadata_routing>` on how the routing
# the methods.
REQUESTER_DOC = """Configure whether metadata should be requested and passed to the
``{method}`` method.
Copy link
Contributor Author

@StefanieSenger StefanieSenger Jun 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rephrased to avoid reading 'request' as a noun and to emphasize that this method configures metadata handling not definitely passes any.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Users and developers almost never need to directly add a new ``MethodMetadataRequest``,
to the consumer's `_metadata_request` attribute, since these are generated
automatically. This attribute is created while running `set_{method}_request` methods
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
automatically. This attribute is created while running `set_{method}_request` methods
automatically. This attribute is modified while running `set_{method}_request` methods

they're created at construction time

Comment on lines +61 to +62
The ``MetadataRouter`` object is created anew whenever a router calls a method from a
sub-estimator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know what you mean here, but it's somewhat misleading.

I would maybe phrase is as:

The MetadataRouter objects are never stored and are always recreated whenever
the object's get_metadata_routing method is called.

Comment on lines -252 to +261
item : object
The given item to be checked if it can be an alias.
item : str
The given name to be checked if it can be an alias for the metadata.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's still an object here, particularly, anything in VALID_REQUEST_VALUES. Also, this function checks if the given object is a string, so the input doesn't have to be a string.

@@ -287,7 +301,7 @@ def request_is_valid(item):


class MethodMetadataRequest:
"""A prescription of how metadata is to be passed to a single method.
"""Container for metadata routing rules associated with a single method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Container for metadata routing rules associated with a single method.
"""Container for metadata requests associated with a single method.

@@ -478,7 +493,7 @@ def _route_params(self, params, parent, caller):
return res

def _consumes(self, params):
"""Check whether the given parameters are consumed by this method.
"""Check whether the given metadata is consumed by this method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Check whether the given metadata is consumed by this method.
"""Check whether the given metadata are consumed by this method.

params is a list of metadata names

@@ -548,7 +563,7 @@ def __init__(self, owner):
)

def consumes(self, method, params):
"""Check whether the given parameters are consumed by the given method.
"""Check whether the given metadata is consumed by the given method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Check whether the given metadata is consumed by the given method.
"""Check whether the given metadata are consumed by the given method.

Comment on lines +826 to 827
the router itself needs to be included in the routing. The passed object
can be an estimator or a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we've had a usecase for it being anything other than self. You could argue that, that information can be retrieved from owner and instead of having a add_self_request we could add a owner_is_consumer to the constructor.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Metadata Routing all issues related to metadata routing, slep006, sample props module:utils No Changelog Needed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants