-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FEAT enable default routing strategy machinery #30946
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thanks, @adrinjalali, for the PR. I opened a PR to this PR to actually define the default policy for Once we agree on this, and possibly on a little refactoring to also do it for the This would require reworking the example further, so I prefer to decouple the two concerns to ease the review process. |
Doing your proposed change in I personally prefer to have a more "explicit" way to enable default routing than saying
As for |
I feel the current naming "default_routing" could be confusing because "default" is polysemic. Maybe keeping I am not sure how to name the policy implemented in this PR ("instance_defaults", "auto" ?) nor the old policy ("classwise_defaults" ?). But for backward compatibility we should maybe keep the old one as the default, until we are confident about the right policy and make it the new default in a future release ? |
I also think that's a problem. Also the name of the new method/attribute Just thinking aloud:
|
@@ -0,0 +1,7 @@ | |||
- |Feature| Added support for default metadata routing through | |||
``set_config(enable_metadata_routing="default_routing")``. This enables | |||
automatic routing of common metadata like ``sample_weight`` and ``groups`` by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
automatic routing of common metadata like ``sample_weight`` and ``groups`` by | |
automatic routing of ``sample_weight`` and ``groups`` metadata (also ...) by |
Can we be very explicit when communicating what exactly is included in the default routing? In the changelog and in the user guide, I would list all of them.
print_routing(clf) | ||
|
||
# %% | ||
# The routing can still be modified using set_*_request methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The routing can still be modified using set_*_request methods | |
# Users can still modify the routing using set_*_request methods |
routing. | ||
|
||
2. Before the user sets any specific routing, via `_get_default_requests`, it | ||
provides the default metadata routing configuration for the instance. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
provides the default metadata routing configuration for the instance. This | |
provides the available metadata routing configuration for the instance. This |
Default has too many meanings already, so I think it would be cool to find another term, such as available maybe?
For example, if a method's signature includes `sample_weight`, this method will: | ||
- During class creation: Create a `set_{method}_request` method to configure | ||
how `sample_weight` should be routed | ||
- Right after initialization: Provide the default routing configuration for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Right after initialization: Provide the default routing configuration for | |
- Right after initialization: Provide the available routing configuration for |
for method, method_requests in defaults.items(): | ||
for pname, value in method_requests.items(): | ||
getattr(requests, method).add_request(param=pname, alias=value) | ||
return requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the purpose of _get_default_requests()
returning the class requests if _default_routing_enabled()
is False?
Wouldn't it be a bug if an estimator doesn't implement their own default request and then as a result metadata is routed to all the methods where that is available in the routing?
Thinking about us then adding a new routing, and suddenly the default request would change.
Edit: Correcting myself here a bit: I have seen that this method gets called by _get_metadata_request()
and of cause something needs to get passed there also if _default_routing_enabled()
is False. But then I would think what gets passed is not a default request anymore. I am still concerned about the terminology.
# Instance-level default requests can override class-level defaults | ||
# and add new method requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Instance-level default requests can override class-level defaults | |
# and add new method requests | |
# Sets default requests for the instance chosen from the available | |
# __metadata_request__* class attributes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand my own suggestion anymore.
Better like this?
# Instance-level default requests can override class-level defaults | |
# and add new method requests | |
# Sets default requests for metadata routing at instance level. |
- Collecting metadata request info from `__metadata_request__*` class | ||
attributes | ||
- Analyzing method signatures for implicit metadata parameters | ||
The collected information is used to create `set_{method}_request` methods | ||
(e.g. set_fit_request) that allow runtime configuration of metadata | ||
routing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Collecting metadata request info from `__metadata_request__*` class | |
attributes | |
- Analyzing method signatures for implicit metadata parameters | |
The collected information is used to create `set_{method}_request` methods | |
(e.g. set_fit_request) that allow runtime configuration of metadata | |
routing. | |
- Analyzing method signatures for implicit metadata parameters | |
The collected information is used to create `set_{method}_request` methods | |
(e.g. set_fit_request) that allow runtime configuration of metadata | |
routing. | |
- Collecting metadata request info from `__metadata_request__*` class | |
attributes |
I would turn the order around to be the same as the methods does it (lowers mental load for third party developers who want to use these.
(e.g. set_fit_request) that allow runtime configuration of metadata | ||
routing. | ||
|
||
2. Before the user sets any specific routing, via `_get_default_requests`, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. Before the user sets any specific routing, via `_get_default_requests`, it | |
2. Before the user sets any specific routing, via `__sklearn_default_request__`, it |
It is __sklearn_default_request__
, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to this, I feel the term "user" here is ambiguous. The developers of any scikit-learn compatible estimators are meant here, if I get it right.
2. Before the user sets any specific routing, via `_get_default_requests`, it | |
2. Before any specific routing gets configured at instance level via | |
`__sklearn_default_request__`, it |
assert _default_routing_enabled() == default_routing | ||
|
||
|
||
def test_default_instance_routing_overrides_class_level(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already test_setting_default_requests() in this file and maybe these tests can be either merged or more clearer separated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have gone over this again to be able to compare with #31401 and left some new comments. (Take them for what they are worth.)
Also after some time having passed, my main concern is that the term default
is used to mean different states here.
# There are two ways to set default metadata routing requests in scikit-learn: | ||
# | ||
# 1. Class-level defaults using `__metadata_request__method` class attributes, | ||
# which sets default request values for all instances of a class, and can even | ||
# remove a metadata from the metadata routing machinery if necessary. | ||
# 2. Instance-level defaults using `__sklearn_default_requests__` method, | ||
# which set default request values at instance level, which means you can customize | ||
# the default request values on a per-instance basis. Note that this only takes | ||
# effect if ``set_config(enable_metadata_routing="default_routing")`` is called. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have thought that I had understood this at one point, but I am not sure anymore:
So if a user (end user, not third party developer) enables "default_routing" in the configs, then a default routing strategy (which is defined at instance level by us or by a third party library) gets used.
But when is the class level default routing enabled then?
And why is it necessary to provide two different ways to set default requests? In my understanding, we can do everything and more on instance level alone.
# 1. Class-level defaults using `__metadata_request__method` class attributes, | ||
# which sets default request values for all instances of a class, and can even | ||
# remove a metadata from the metadata routing machinery if necessary. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't quite get when the class level routing configuration would be enabled? On set_config(enable_metadata_routing=True)
?
And what would it mean when it changes the "metadata routing machinery"? Would there be any routing with set_config(enable_metadata_routing=True)
that is independent from what is defined in __metadata_request__method
?
Edit: My confusion came from the fact that I thought that "remove a metadata from the metadata routing machinery" meant to set routing to it to False
, and consequently they must have been set to True
before.
Now I see, that class level configuration is only used in the MetadataRequest
object, if set_config(enable_metadata_routing=True)
and that "remove a metadata from the metadata routing machinery" actually means to disable users to set it at all by fully remove it, doesn't it?
- --> If so, then I would think it's better either reformulate this or not to mention it here so prominently, since that would be a very special case and this half sentence is a bit confusing.
|
||
class DefaultRoutingClassifier(ClassifierMixin, BaseEstimator): | ||
# Class-level default request for fit method | ||
__metadata_request__fit = {"sample_weight": True} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On naming __metadata_request__fit
and the other attributes:
For better maintainability, I would pick names for these attributes that don't contain any method names.
These will show up for any contributor in scikit-learn and third party library upon inspection or debugging and the first thought would be that this is a callable method, when in fact it is not callable.
# Instance-level default requests can override class-level defaults | ||
# and add new method requests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand my own suggestion anymore.
Better like this?
# Instance-level default requests can override class-level defaults | |
# and add new method requests | |
# Sets default requests for metadata routing at instance level. |
(e.g. set_fit_request) that allow runtime configuration of metadata | ||
routing. | ||
|
||
2. Before the user sets any specific routing, via `_get_default_requests`, it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding to this, I feel the term "user" here is ambiguous. The developers of any scikit-learn compatible estimators are meant here, if I get it right.
2. Before the user sets any specific routing, via `_get_default_requests`, it | |
2. Before any specific routing gets configured at instance level via | |
`__sklearn_default_request__`, it |
This creates the machinery needed to setup a default routing for metadata on the estimator level.