-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Metadata routing for VotingClassifier and VotingRegressor #27584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA Metadata routing for VotingClassifier and VotingRegressor #27584
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't change the test_metaestimators_metadata_routing.py
and _metadata_requests.py
. You can write tests the same way as done for ColumnTransformer and Pipeline.
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through your comments, @adrinjalali, and applied your suggestions (except for one, that I didn't know how to).
Would you have a look?
Thanks a lot so far. :)
sklearn/ensemble/_voting.py
Outdated
def _more_tags(self): | ||
return {"preserves_dtype": []} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be nice to have a comment as why this is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be honest, I don't remember anymore and cannot re-create the error that made me put it.
So, I think I should leave it away then.
|
||
with pytest.raises(ValueError, match=re.escape(error_message)): | ||
est.fit(X, y, sample_weight=sample_weight, metadata=metadata) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also need a test to check get_metadata_routing
works before calling fit:
@pytest.mark.usefixtures("enable_slep006")
@pytest.mark.parametrize(
"Estimator, Child",
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
)
def test_get_metadata_routing_without_fit(Estimator, Child):
# Test that get_metadata_routing() doesn't raise when called before fit.
est = Estimator([("sub_est", Child())])
est.get_metadata_routing()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would we need that test here specifically? It seems like a test for testing get_metadata_routing
(), not a test for the routing in VotingClassifier
and VotingRegressor
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a test for this specific implementation of get_metadata_routing
for this estimator. It is to avoid this bug: #28239
I made the improvements according to your review, @adrinjalali. |
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)], | ||
) | ||
def test_get_metadata_routing_without_fit(Estimator, Child): | ||
# Test that metadata_routing() doesn't raise when called before fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# Test that metadata_routing() doesn't raise when called before fit. | |
"""Test that get_metadata_routing() works regardless of the Child's | |
consumption of any metadata.""" |
Maybe this helps to explain why this test is here.
I cannot fully see the connection with #28239, can you please explain, @adrinjalali?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core of that issue was that this method would fail if the sub-estimator didn't support any metadata at all. So we test it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali
I can see that in issue #28239 the routing failed on fit
, when the sub-estimator cannot handle any sample weights.
Your solution from #28240 was to make sure sample_weight
is getting to be part for fit_params
before the routing is started and you added a test (test_non_consuming_estimator_works
) that tests exactly that.
VotingClassifier
is not covered by that test, but it surely needs a similar test. I still cannot see how test_get_metadata_routing_without_fit
would test for the same case as test_non_consuming_estimator_works
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.usefixtures("enable_slep006") | ||
@pytest.mark.parametrize( | ||
"Estimator, Child", | ||
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)], | |
[(VotingClassifier, NonConsumingClassifier), (VotingRegressor, NonConsumingRegressor)], |
After #28240 is merged, to be more explicit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#28240 is merged, but I'm not sure if I understand your comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good.
@pytest.mark.usefixtures("enable_slep006") | ||
@pytest.mark.parametrize( | ||
"Estimator, Child", | ||
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#28240 is merged, but I'm not sure if I understand your comment.
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)], | ||
) | ||
def test_get_metadata_routing_without_fit(Estimator, Child): | ||
# Test that metadata_routing() doesn't raise when called before fit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The core of that issue was that this method would fail if the sub-estimator didn't support any metadata at all. So we test it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this informative PR!
I leveraged some of these tests in #28432, which is good since the tests are almost 100% re-usable and test the Bagging* classes similarly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I will apply the nitpicks and make the auto-merge
Reference Issues/PRs
Towards #22893
What does this implement/fix? Explain your changes.
Adds metadata routing to
VotingClassifier
. The challenge here was that it takes a list of (name, est) tuples as an init argument instead of only an estimator. I have modifiedtest_metaestimators_metadata_routing.py
and_metadata_requests.py
for handling this.Any other comments?
The main question is, if the modifications of the test and the routing file should stay as they are.
All tests pass as it is (except for the old tests for
VotingRegressor
,StackingClassifier
andStackingRegressor
). These three have to get their routing implemented in the same PR, I think, because they all share a common function (_fit_single_estimator
) withVotingClassifier
. They also take a list of (name, est) tuples instead of a single estimator.I'll wait with this until
test_metaestimators_metadata_routing.py
and_metadata_requests.py
look as they should.