Skip to content

FEAT enable default routing strategy machinery #30946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

adrinjalali
Copy link
Member

This creates the machinery needed to setup a default routing for metadata on the estimator level.

Copy link

github-actions bot commented Mar 5, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: fcbc431. Link to the linter CI: here

@adrinjalali adrinjalali marked this pull request as ready for review March 6, 2025 12:16
@adrinjalali
Copy link
Member Author

@ogrisel ogrisel self-requested a review March 7, 2025 19:14
@virchan virchan added the Metadata Routing all issues related to metadata routing, slep006, sample props label Mar 9, 2025
@ogrisel
Copy link
Member

ogrisel commented Mar 11, 2025

Thanks, @adrinjalali, for the PR. I opened a PR to this PR to actually define the default policy for sample_weight in BaseEstimator:

adrinjalali#5

Once we agree on this, and possibly on a little refactoring to also do it for the groups metadata of splits, I would like to follow up with another PR to this PR to actually make the default policy enabled by default when enable_metadata_routing=True and instead expose a new enabled_metadata_routing="empty_requests", enabled_metadta_routing="verbose_requests" or enabled_metadta_routing="explicit_requests" to implement the fully empty requests scheme.

This would require reworking the example further, so I prefer to decouple the two concerns to ease the review process.

@adrinjalali
Copy link
Member Author

Doing your proposed change in BaseEstimator would affect all children, including third party estimators, in a non-equal-footing way. Especially when combining this with enable_metadata_routing=True, by default all metadata in sklearn that we care about is requested, but the metadata accepted by third party libs is not.

I personally prefer to have a more "explicit" way to enable default routing than saying enable_metadata_routing=True, but if we're doing this, to keep the consistency, I see two paths forward:

  1. Request all metadata by default with enable_metadata_routing=True, which would include groups and sample_weight
  2. Enabling this default routing for objects which actually have sample_weight somewhere. For instance, for score in ClassifierMixin and RegressorMixin. It should solve our issue w/o making "smart" decisions. We can then have a FitSampleWeightRequesterMixin to add to the ones who have fit methods with sample_weight, and add that explicitly to those estimators / estimator groups (like a base class for a group of estimators).

As for groups in split objects, I intentionally didn't include that, since there's a difference between groups and sample_weight. In split objects, they raise if groups is not requested. So there's no scenario where it can be not passed by the split object works; whereas sample_weight is always optional, and all objects who accept it, work w/o it. Therefore I think it makes sense to keep groups setting as is.

@antoinebaker
Copy link
Contributor

antoinebaker commented Mar 18, 2025

I feel the current naming "default_routing" could be confusing because "default" is polysemic.
enable_metadata_routing="default_routing" means, as a request policy, to use the instance level requests as defaults. But it could be misleading to call it "default_routing", as it is not currently the default strategy, and isn't about the routing mechanics per se but more about the requests.

Maybe keeping enable_metadata_routing : bool, None as before, and adding a new config variable could help ?
Something like metadata_request_policy : str to choose among the various policies we would like to implement and test.
For instance metadata_request_policy = "empty", "explicit", "verbose" for the ones mentioned in #30946 (comment).

I am not sure how to name the policy implemented in this PR ("instance_defaults", "auto" ?) nor the old policy ("classwise_defaults" ?). But for backward compatibility we should maybe keep the old one as the default, until we are confident about the right policy and make it the new default in a future release ?

@StefanieSenger
Copy link
Contributor

StefanieSenger commented Mar 20, 2025

I feel the current naming "default_routing" could be confusing because "default" is polysemic.

I also think that's a problem.

Also the name of the new method/attribute __sklearn_default_request__ is so similar to the __metadata_request__* attributes, that it requires getting used to this part of the code to not constantly confuse them. It would be better to use "speaking names".

Just thinking aloud:

__sklearn_default_request__ could maybe be __sklearn_default_metadata_requests__ ?

__metadata_request__* could maybe be __sklearn_available_metadata_requests__*?

@@ -0,0 +1,7 @@
- |Feature| Added support for default metadata routing through
``set_config(enable_metadata_routing="default_routing")``. This enables
automatic routing of common metadata like ``sample_weight`` and ``groups`` by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
automatic routing of common metadata like ``sample_weight`` and ``groups`` by
automatic routing of ``sample_weight`` and ``groups`` metadata (also ...) by

Can we be very explicit when communicating what exactly is included in the default routing? In the changelog and in the user guide, I would list all of them.

print_routing(clf)

# %%
# The routing can still be modified using set_*_request methods
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The routing can still be modified using set_*_request methods
# Users can still modify the routing using set_*_request methods

routing.

2. Before the user sets any specific routing, via `_get_default_requests`, it
provides the default metadata routing configuration for the instance. This
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
provides the default metadata routing configuration for the instance. This
provides the available metadata routing configuration for the instance. This

Default has too many meanings already, so I think it would be cool to find another term, such as available maybe?

For example, if a method's signature includes `sample_weight`, this method will:
- During class creation: Create a `set_{method}_request` method to configure
how `sample_weight` should be routed
- Right after initialization: Provide the default routing configuration for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Right after initialization: Provide the default routing configuration for
- Right after initialization: Provide the available routing configuration for

for method, method_requests in defaults.items():
for pname, value in method_requests.items():
getattr(requests, method).add_request(param=pname, alias=value)
return requests
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of _get_default_requests() returning the class requests if _default_routing_enabled() is False?

Wouldn't it be a bug if an estimator doesn't implement their own default request and then as a result metadata is routed to all the methods where that is available in the routing?

Thinking about us then adding a new routing, and suddenly the default request would change.

Edit: Correcting myself here a bit: I have seen that this method gets called by _get_metadata_request() and of cause something needs to get passed there also if _default_routing_enabled() is False. But then I would think what gets passed is not a default request anymore. I am still concerned about the terminology.

Comment on lines +488 to +489
# Instance-level default requests can override class-level defaults
# and add new method requests
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Instance-level default requests can override class-level defaults
# and add new method requests
# Sets default requests for the instance chosen from the available
# __metadata_request__* class attributes.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand my own suggestion anymore.

Better like this?

Suggested change
# Instance-level default requests can override class-level defaults
# and add new method requests
# Sets default requests for metadata routing at instance level.

Comment on lines +1457 to +1462
- Collecting metadata request info from `__metadata_request__*` class
attributes
- Analyzing method signatures for implicit metadata parameters
The collected information is used to create `set_{method}_request` methods
(e.g. set_fit_request) that allow runtime configuration of metadata
routing.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Collecting metadata request info from `__metadata_request__*` class
attributes
- Analyzing method signatures for implicit metadata parameters
The collected information is used to create `set_{method}_request` methods
(e.g. set_fit_request) that allow runtime configuration of metadata
routing.
- Analyzing method signatures for implicit metadata parameters
The collected information is used to create `set_{method}_request` methods
(e.g. set_fit_request) that allow runtime configuration of metadata
routing.
- Collecting metadata request info from `__metadata_request__*` class
attributes

I would turn the order around to be the same as the methods does it (lowers mental load for third party developers who want to use these.

(e.g. set_fit_request) that allow runtime configuration of metadata
routing.

2. Before the user sets any specific routing, via `_get_default_requests`, it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
2. Before the user sets any specific routing, via `_get_default_requests`, it
2. Before the user sets any specific routing, via `__sklearn_default_request__`, it

It is __sklearn_default_request__, isn't it?

Copy link
Contributor

@StefanieSenger StefanieSenger May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, I feel the term "user" here is ambiguous. The developers of any scikit-learn compatible estimators are meant here, if I get it right.

Suggested change
2. Before the user sets any specific routing, via `_get_default_requests`, it
2. Before any specific routing gets configured at instance level via
`__sklearn_default_request__`, it

assert _default_routing_enabled() == default_routing


def test_default_instance_routing_overrides_class_level():
Copy link
Contributor

@StefanieSenger StefanieSenger Mar 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already test_setting_default_requests() in this file and maybe these tests can be either merged or more clearer separated?

Copy link
Contributor

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have gone over this again to be able to compare with #31401 and left some new comments. (Take them for what they are worth.)

Also after some time having passed, my main concern is that the term default is used to mean different states here.

Comment on lines +470 to +478
# There are two ways to set default metadata routing requests in scikit-learn:
#
# 1. Class-level defaults using `__metadata_request__method` class attributes,
# which sets default request values for all instances of a class, and can even
# remove a metadata from the metadata routing machinery if necessary.
# 2. Instance-level defaults using `__sklearn_default_requests__` method,
# which set default request values at instance level, which means you can customize
# the default request values on a per-instance basis. Note that this only takes
# effect if ``set_config(enable_metadata_routing="default_routing")`` is called.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have thought that I had understood this at one point, but I am not sure anymore:

So if a user (end user, not third party developer) enables "default_routing" in the configs, then a default routing strategy (which is defined at instance level by us or by a third party library) gets used.

But when is the class level default routing enabled then?

And why is it necessary to provide two different ways to set default requests? In my understanding, we can do everything and more on instance level alone.

Comment on lines +472 to +474
# 1. Class-level defaults using `__metadata_request__method` class attributes,
# which sets default request values for all instances of a class, and can even
# remove a metadata from the metadata routing machinery if necessary.
Copy link
Contributor

@StefanieSenger StefanieSenger May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite get when the class level routing configuration would be enabled? On set_config(enable_metadata_routing=True)?

And what would it mean when it changes the "metadata routing machinery"? Would there be any routing with set_config(enable_metadata_routing=True) that is independent from what is defined in __metadata_request__method?

Edit: My confusion came from the fact that I thought that "remove a metadata from the metadata routing machinery" meant to set routing to it to False, and consequently they must have been set to True before.

Now I see, that class level configuration is only used in the MetadataRequest object, if set_config(enable_metadata_routing=True) and that "remove a metadata from the metadata routing machinery" actually means to disable users to set it at all by fully remove it, doesn't it?

  • --> If so, then I would think it's better either reformulate this or not to mention it here so prominently, since that would be a very special case and this half sentence is a bit confusing.


class DefaultRoutingClassifier(ClassifierMixin, BaseEstimator):
# Class-level default request for fit method
__metadata_request__fit = {"sample_weight": True}
Copy link
Contributor

@StefanieSenger StefanieSenger May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On naming __metadata_request__fit and the other attributes:

For better maintainability, I would pick names for these attributes that don't contain any method names.

These will show up for any contributor in scikit-learn and third party library upon inspection or debugging and the first thought would be that this is a callable method, when in fact it is not callable.

Comment on lines +488 to +489
# Instance-level default requests can override class-level defaults
# and add new method requests
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand my own suggestion anymore.

Better like this?

Suggested change
# Instance-level default requests can override class-level defaults
# and add new method requests
# Sets default requests for metadata routing at instance level.

(e.g. set_fit_request) that allow runtime configuration of metadata
routing.

2. Before the user sets any specific routing, via `_get_default_requests`, it
Copy link
Contributor

@StefanieSenger StefanieSenger May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding to this, I feel the term "user" here is ambiguous. The developers of any scikit-learn compatible estimators are meant here, if I get it right.

Suggested change
2. Before the user sets any specific routing, via `_get_default_requests`, it
2. Before any specific routing gets configured at instance level via
`__sklearn_default_request__`, it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Metadata Routing all issues related to metadata routing, slep006, sample props module:model_selection module:utils
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants