-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[WIP] Sample property routing #9566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Can you show an example with a pipeline, where some estimators support sample weights and others don't? How does it look with FeatureUnion? Why did I think you did what looks to me like "shallow" routing in that you specify routing at every level for one level below, right? (Not sure how that deals with FeatureUnion right now because that doesn't do routing). Did you want to allow renaming? What would that look like? |
I've not implemented in Currently router = check_routing(self.prop_routing,
dests=[[name, '*']
for name, _ in self.transformer_list],
default={'*': '*'})
props_per_transformer, unused = router(fit_params)
if unused:
raise ValueError(...) Yes, it is shallow. I've not yet come up with a deep approach, but perhaps it is possible. I think the present solution is going to be much easier if we want to maintain backwards compatibility. Yes, it supports renaming (see the One thing I remain confused about is what happens when someone calls
Not supporting renaming is simpler in this case. However, you can imagine we might want to support routing in |
We can withhold both renaming and passing props to I'll have to think about the mechanics of a "deep" spec but I think I'd rather local ("shallow") routing as that makes it easier to then wrap something. |
I forgot to respond to this:
Supporting is not sufficient to be passed something. The user needs to be explicit about where a prop should be passed. If we allow it to be based on supporting rather than the user requesting, then changing an estimator from Rather, if you want to route a single p = Pipeline(('a', MyTransformer()), ('b', MyTransformer()), ('c', MyTransformer()),
prop_routing={'a': 'sample_weight', 'c': 'sample_weight'})
p.fit(X, y, sample_weight=sample_weight) Currently Pipeline defines the destination alias p = Pipeline(('a', MyTransformer()), ('b', MyTransformer()), ('c', MyTransformer()),
prop_routing={'*': 'sample_weight'})
p.fit(X, y, sample_weight=sample_weight) As use-cases become apparent, we may decide to add other destination aliases to Pipeline including:
|
Took me a while to go through all the comments on #4497, some other threads, and here. Is this proposal now the way to go (with possible minor changes)? If yes, I could give you a hand if you wanna prioritize and revive it. |
Same as @adrinjalali, if this proposal is the way to go, I would help to make it happen. |
I think there is still no consensus. |
But facts on the ground might make consensus happen. There are limitations to this approach:
|
(I'm not very excited about this part of the roadmap and would rather think about freezing first - also estimator tags are ready for review ;) |
(You're not excited about it because you don't think it's needed, or
because you think any solution is bound to be ugly?)
|
I think there is a need, but I don't think it impacts a lot of users. I think other API issues impact more users. |
Sorry that I duplicated some work in #15370, didn't see this PR until now. But maybe it will help give some ideas regarding that there are common use cases where one also needs sample properties passed to transform from both train and test. So I didn't name it |
Reference issues
In an ideal world:
Functionality
As derived from my ranting at Thierry's scikit-learn/enhancement_proposals#6, this:
Pipeline.fit
groups
to*SearchCV.fit
check_routing
docs), in which the*SearchCV
default policy, but not the Pipeline default policy, can be expressedprop_routing
parameter to facilitate thisExample?
Ideally we should be able to get nested CV with groups and weighted scoring working:
It's pretty intricate, but I'm not sure we're going to get better than this!
TODO (and help wanted!)
check_routing
API is as we want (please comment!)check_routing
Pipeline
and*SearchCV
has_fit_parameter
where possibleThe last steps there are the most arduous, and if we decide this is the way forward, I would welcome collaboration when we get to that stage.