-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
DOC Clarify metadata routing docs from _metadata_requests.py
module
#31419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
DOC Clarify metadata routing docs from _metadata_requests.py
module
#31419
Conversation
If the router is also a consumer, it also checks for warnings of | ||
`self`'s/consumer's requested metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have removed the two lines on checking warnings, since this method only does the smaller part of the work of checking for warnings and thus this might not be as relevant here. But it would actually be nice, to do this all in one place. The rest of the warning checking is done in MethodMetadataRequest._route_params()
.
For another PR, I was thinking of trying to put all the warning checking into here (MetadataRouter.route_params()
) and then documenting it with:
Metadata requests are also validated and may trigger warnings for newly added
metadata. This includes the router’s own requests if it also acts as a consumer.
Would that be a good idea?
Specifies which metadata should be routed to `param` | ||
Specifies which metadata should be routed to the method that owns this | ||
`MethodMetadataRequest`. | ||
- str: the name (or alias) of metadata given to a meta-estimator that | ||
should be routed to this parameter. | ||
should be routed to the method that owns this `MethodMetadataRequest`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was wrong before.
sklearn/utils/_metadata_requests.py
Outdated
This class is used by router objects to store and handle metadata routing. | ||
Routing information is stored as a dictionary of the form ``{"object_name": | ||
This class is used by :term:`meta-estimators` or functions that can route metadata | ||
to store and handle their metadata routing. Routing information is stored in a | ||
dictionary-like structure of the form ``{"object_name": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finding different words for "router objects" to prevent confusion with this classes' own name.
The routing is coordinated by building ``MetadataRequest`` objects | ||
for objects that consume metadata, and ``MetadataRouter`` objects for objects that | ||
can route metadata, which are then aligned during a call to `process_routing()`." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The part explaining that MetadataRequest
objects and MetadataRouter
objects are aligned during a call to process_routing()
is crucial to communicate the general idea.
item : object | ||
The given item to be checked if it can be an alias. | ||
item : str | ||
The given name to be checked if it can be an alias for the metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a copy-paste from the docstring of request_is_valid
and even though a "str" is an "object", it'll help users more to be more specific here. I would also rename the item
param into name
.
But do request_is_alias
and request_is_valid
need to be in the public API at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still an object
here, particularly, anything in VALID_REQUEST_VALUES
. Also, this function checks if the given object is a string, so the input doesn't have to be a string.
@adrinjalali, would you like to have a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a very thoughtful change and a nice improvement! I feel like I understand this stuff better (though still some confusion).
The module docstring _metadata_requests.py
is quite long, it maybe nice to just visually break it up by adding a section header (even though this will never be rendered) for the part that talks about MetadataRequest
and MetadataRouter
.
Additionally, it feels like the last paragraph:
scikit-learn/sklearn/utils/_metadata_requests.py
Lines 69 to 74 in 4560abc
The ``set_{method}_request`` methods are dynamically generated for estimators | |
which inherit from the ``BaseEstimator``. This is done by attaching instances | |
of the ``RequestMethod`` descriptor to classes, which is done in the | |
``_MetadataRequester`` class, and ``BaseEstimator`` inherits from this mixin. | |
This mixin also implements the ``get_metadata_routing``, which meta-estimators | |
need to override, but it works for simple consumers as is. |
belongs further up, with the section that talks about MetadataRequest
.
Another suggestion which doesn't have to be made by you/in this PR:
I feel like the 'alias' part is half explained here:
scikit-learn/sklearn/utils/_metadata_requests.py
Lines 39 to 43 in 4560abc
The ``alias`` above in the ``add`` method has to be either a string (an alias), | |
or a {True (requested), False (unrequested), None (error if passed)}``. There | |
are some other special values such as ``UNUSED`` and ``WARN`` which are used | |
for purposes such as warning of removing a metadata in a child class, but not | |
used by the end users. |
and half explained here (too long to show all of it, but here is the start):
# Each request value needs to be one of the following values, or an alias. |
It would be nice to have it in one place. Also it would be nice to more explicitly explain things "error if passed" -> raise error if this parameter is passed (I think this is what is meant?)
sklearn/utils/_metadata_requests.py
Outdated
can route metadata, which are then aligned during a call to `process_routing()`." | ||
The ``MetadataRequest`` and ``MetadataRouter`` objects are constructed via a | ||
``get_metadata_routing`` method, which all estimators (should) therefore implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the brackets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was, I think, supposed to express that third party developers and users who implement custom meta-estimators should also implement a get_metadata_routing
method, while it is not strictly required to write a scikit-learn compatible estimator.
I have now re-phrased it to only match scikit-learn's native estimators. A guide on how third party developers can use scikit-learns metadata routing API is already existing in the user guide and better explained there as well.
Co-authored-by: Lucy Liu <jliu176@gmail.com>
…ikit-learn into metadata_internal_docs
Co-authored-by: Lucy Liu <jliu176@gmail.com>
…ikit-learn into metadata_internal_docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reviewing, @lucyleeow!
I went through all your comments and have further refined the docs, especially the module docstring. I agree it is quite long, but I have broken it up with section headings and also re-ordered it a bit.
Now, it might be also exportable / usable in the docs that get rendered (maybe in the example linked below), and I will think about it.
The last paragraph I have left where it is for now, putting "implementation details" as a sub-heading, because it seems not so relevant to understand how to implement a custom estimator that routes metadata, which was the lens I tried to read this through. I can also move it up, if that helps though.
I will keep your comment on the alias in mind. I have not fully wrapped my head around it yet.
sklearn/utils/_metadata_requests.py
Outdated
can route metadata, which are then aligned during a call to `process_routing()`." | ||
The ``MetadataRequest`` and ``MetadataRouter`` objects are constructed via a | ||
``get_metadata_routing`` method, which all estimators (should) therefore implement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was, I think, supposed to express that third party developers and users who implement custom meta-estimators should also implement a get_metadata_routing
method, while it is not strictly required to write a scikit-learn compatible estimator.
I have now re-phrased it to only match scikit-learn's native estimators. A guide on how third party developers can use scikit-learns metadata routing API is already existing in the user guide and better explained there as well.
sklearn/utils/_metadata_requests.py
Outdated
``get_metadata_routing`` returns a ``MetadataRouter`` object. It stores | ||
information on which method from the router object calls which in a consumer's | ||
object, because this is how the metadata is to be passed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, exactly! It would look something like this:
{'estimator': {'mapping': [{'caller': 'fit', 'callee': 'fit'}, {'caller': 'predict', 'callee': 'predict'}, {'caller': 'score', 'callee': 'score'}], 'router': {'fit': {'sample_weight': True}, 'predict': {'groups': None}, 'score': {'sample_weight': None}}}}
Though, now I am wondering why the second part is called "router" when in other parts of the docs the same concept is referred to as "request"...
Do you think it makes sense to change that, too?
@adrinjalali, would you verify this can be changed? What would be your take on that?
sklearn/utils/_metadata_requests.py
Outdated
also returns a ``MetadataRouter`` object. Its | ||
routing information includes both information about the object itself (added | ||
via ``MetadataRouter.add_self_request``), as well as the routing information | ||
for its sub-estimators. | ||
A ``MetadataRequest`` instance includes one ``MethodMetadataRequest`` per | ||
method in ``METHODS``, which includes ``fit``, ``score``, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I had spared that out until now. But you are right to ask me to clarify this as well. So I have tried to re-write this paragraph. Please let me know what you think. :)
My understanding is that each consuming estimator communicates their requests by their _get_metadata_request
method, which in the simplest case inspects the methods' signatures and builds a dictionary with all the requested metadata from it, which is then stored as an estimator's _metadata_request
attribute. This attribute is created while running set_{method}_request
via RequestMethod.__get__
(not sure how it gets there though, just inspecting the output of my debugging session).
Something like metadatarequest.fit.add
does not exist in the code as far as I can tell. I would assume the syntax has changed and metadatarequest.fit.add(param="sample_weight", alias="my_weights")
refers to method_metadata_request.add_request(param=prop, alias=alias)
, which - yes - is done per method and add
became add_request
.
the router itself needs to be included in the routing. The passed object | ||
can be an estimator or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my current understanding, it will mostly be an estimator and I wonder more about how this would work / what could happen for this to be another MetadataRequest
.
If you look at the example for developers writing their own meta-estimators, there we would add self
into .add_self_request()
when the MetadataRouter
is build:
class RouterConsumerClassifier(MetaEstimatorMixin, ClassifierMixin, BaseEstimator):
def __init__(self, estimator):
self.estimator = estimator
def get_metadata_routing(self):
router = (
[MetadataRouter](https://scikit-learn.org/stable/modules/generated/sklearn.utils.metadata_routing.MetadataRouter.html#sklearn.utils.metadata_routing.MetadataRouter)(owner=self.__class__.__name__)
# defining metadata routing request values for usage in the meta-estimator
.add_self_request(self)
# defining metadata routing request values for usage in the sub-estimator
.add(
estimator=self.estimator,
method_mapping=[MethodMapping](https://scikit-learn.org/stable/modules/generated/sklearn.utils.metadata_routing.MethodMapping.html#sklearn.utils.metadata_routing.MethodMapping)()
.add(caller="fit", callee="fit")
.add(caller="predict", callee="predict")
.add(caller="score", callee="score"),
)
)
return router
``MetadataRouter`` includes information about sub-objects' routing and how | ||
methods are mapped together. For instance, the information about which methods | ||
of a sub-estimator are called in which methods of the meta-estimator are all | ||
stored here. Conceptually, this information looks like: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoiding the word "store" for MetadataRouter
, since it gets re-build on demand and is not attached to any object.
the router itself needs to be included in the routing. The passed object | ||
can be an estimator or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay that makes more sense, and combined with your explanation in #31419 (comment) I understand better.
The MetadataRequest
use case is indeed intriguing. I can imagine 'how' it could work but not sure in what situation it would be used.
Anyway thanks!
Co-authored-by: Lucy Liu <jliu176@gmail.com>
sklearn/utils/_metadata_requests.py
Outdated
REQUESTER_DOC = """ Request metadata passed to the ``{method}`` method. | ||
|
||
Note that this method is only relevant if | ||
``enable_metadata_routing=True`` (see :func:`sklearn.set_config`). | ||
Please see :ref:`User Guide <metadata_routing>` on how the routing | ||
# the methods. | ||
REQUESTER_DOC = """Configure whether metadata should be requested and passed to the | ||
``{method}`` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased to avoid reading 'request' as a noun and to emphasize that this method configures metadata handling not definitely passes any.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
Users and developers almost never need to directly add a new ``MethodMetadataRequest``, | ||
to the consumer's `_metadata_request` attribute, since these are generated | ||
automatically. This attribute is created while running `set_{method}_request` methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
automatically. This attribute is created while running `set_{method}_request` methods | |
automatically. This attribute is modified while running `set_{method}_request` methods |
they're created at construction time
The ``MetadataRouter`` object is created anew whenever a router calls a method from a | ||
sub-estimator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what you mean here, but it's somewhat misleading.
I would maybe phrase is as:
The
MetadataRouter
objects are never stored and are always recreated whenever
the object'sget_metadata_routing
method is called.
item : object | ||
The given item to be checked if it can be an alias. | ||
item : str | ||
The given name to be checked if it can be an alias for the metadata. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's still an object
here, particularly, anything in VALID_REQUEST_VALUES
. Also, this function checks if the given object is a string, so the input doesn't have to be a string.
@@ -287,7 +301,7 @@ def request_is_valid(item): | |||
|
|||
|
|||
class MethodMetadataRequest: | |||
"""A prescription of how metadata is to be passed to a single method. | |||
"""Container for metadata routing rules associated with a single method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Container for metadata routing rules associated with a single method. | |
"""Container for metadata requests associated with a single method. |
@@ -478,7 +493,7 @@ def _route_params(self, params, parent, caller): | |||
return res | |||
|
|||
def _consumes(self, params): | |||
"""Check whether the given parameters are consumed by this method. | |||
"""Check whether the given metadata is consumed by this method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check whether the given metadata is consumed by this method. | |
"""Check whether the given metadata are consumed by this method. |
params is a list of metadata names
@@ -548,7 +563,7 @@ def __init__(self, owner): | |||
) | |||
|
|||
def consumes(self, method, params): | |||
"""Check whether the given parameters are consumed by the given method. | |||
"""Check whether the given metadata is consumed by the given method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Check whether the given metadata is consumed by the given method. | |
"""Check whether the given metadata are consumed by the given method. |
the router itself needs to be included in the routing. The passed object | ||
can be an estimator or a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we've had a usecase for it being anything other than self
. You could argue that, that information can be retrieved from owner
and instead of having a add_self_request
we could add a owner_is_consumer
to the constructor.
What does this implement/fix? Explain your changes.
This PR aims to clarify the documentation (internal and public) on the
_metadata_requests.py
module.It's also meant to simplify further work on this module for us (me).
This PR entails:
object
,param
,prop
) with what is meant in this context in method docstrings, when a special term is needed to explain the usecaseobject
andparam
in parts of the docstrings where they seem to refer to more technical aspects (objects
as class instances,params
as something to pass to or accepted by methods)prop
andproperty
as terms completely, since I find this is very ambiguous