Skip to content

FEA Metadata routing for VotingClassifier and VotingRegressor #27584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Feb 22, 2024

Conversation

StefanieSenger
Copy link
Contributor

@StefanieSenger StefanieSenger commented Oct 14, 2023

Reference Issues/PRs

Towards #22893

What does this implement/fix? Explain your changes.

Adds metadata routing to VotingClassifier. The challenge here was that it takes a list of (name, est) tuples as an init argument instead of only an estimator. I have modified test_metaestimators_metadata_routing.py and _metadata_requests.py for handling this.

Any other comments?

The main question is, if the modifications of the test and the routing file should stay as they are.

All tests pass as it is (except for the old tests for VotingRegressor, StackingClassifier and StackingRegressor). These three have to get their routing implemented in the same PR, I think, because they all share a common function (_fit_single_estimator) with VotingClassifier. They also take a list of (name, est) tuples instead of a single estimator.

I'll wait with this until test_metaestimators_metadata_routing.py and _metadata_requests.py look as they should.

@github-actions
Copy link

github-actions bot commented Oct 14, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d7d14c7. Link to the linter CI: here

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't change the test_metaestimators_metadata_routing.py and _metadata_requests.py. You can write tests the same way as done for ColumnTransformer and Pipeline.

@StefanieSenger StefanieSenger changed the title FEA Metadata routing for VotingClassifier FEA Metadata routing for VotingClassifier and VotingRegressor Oct 24, 2023
StefanieSenger and others added 6 commits January 5, 2024 11:35
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Copy link
Contributor Author

@StefanieSenger StefanieSenger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through your comments, @adrinjalali, and applied your suggestions (except for one, that I didn't know how to).
Would you have a look?
Thanks a lot so far. :)

Comment on lines 174 to 175
def _more_tags(self):
return {"preserves_dtype": []}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be nice to have a comment as why this is needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I don't remember anymore and cannot re-create the error that made me put it.
So, I think I should leave it away then.


with pytest.raises(ValueError, match=re.escape(error_message)):
est.fit(X, y, sample_weight=sample_weight, metadata=metadata)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also need a test to check get_metadata_routing works before calling fit:

@pytest.mark.usefixtures("enable_slep006")
@pytest.mark.parametrize(
    "Estimator, Child",
    [(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
)
def test_get_metadata_routing_without_fit(Estimator, Child):
    # Test that get_metadata_routing() doesn't raise when called before fit.
    est = Estimator([("sub_est", Child())])
    est.get_metadata_routing()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would we need that test here specifically? It seems like a test for testing get_metadata_routing(), not a test for the routing in VotingClassifier and VotingRegressor.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a test for this specific implementation of get_metadata_routing for this estimator. It is to avoid this bug: #28239

@StefanieSenger
Copy link
Contributor Author

I made the improvements according to your review, @adrinjalali.
Would you take another look? What do you think about the preserves_dtype tag? All common tests pass without it...

[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
)
def test_get_metadata_routing_without_fit(Estimator, Child):
# Test that metadata_routing() doesn't raise when called before fit.
Copy link
Contributor Author

@StefanieSenger StefanieSenger Feb 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# Test that metadata_routing() doesn't raise when called before fit.
"""Test that get_metadata_routing() works regardless of the Child's
consumption of any metadata."""

Maybe this helps to explain why this test is here.

I cannot fully see the connection with #28239, can you please explain, @adrinjalali?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of that issue was that this method would fail if the sub-estimator didn't support any metadata at all. So we test it here.

Copy link
Contributor Author

@StefanieSenger StefanieSenger Feb 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adrinjalali
I can see that in issue #28239 the routing failed on fit, when the sub-estimator cannot handle any sample weights.
Your solution from #28240 was to make sure sample_weight is getting to be part for fit_params before the routing is started and you added a test (test_non_consuming_estimator_works) that tests exactly that.
VotingClassifier is not covered by that test, but it surely needs a similar test. I still cannot see how test_get_metadata_routing_without_fit would test for the same case as test_non_consuming_estimator_works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pytest.mark.usefixtures("enable_slep006")
@pytest.mark.parametrize(
"Estimator, Child",
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
[(VotingClassifier, NonConsumingClassifier), (VotingRegressor, NonConsumingRegressor)],

After #28240 is merged, to be more explicit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#28240 is merged, but I'm not sure if I understand your comment.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good.

@pytest.mark.usefixtures("enable_slep006")
@pytest.mark.parametrize(
"Estimator, Child",
[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#28240 is merged, but I'm not sure if I understand your comment.

[(VotingClassifier, ConsumingClassifier), (VotingRegressor, ConsumingRegressor)],
)
def test_get_metadata_routing_without_fit(Estimator, Child):
# Test that metadata_routing() doesn't raise when called before fit.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core of that issue was that this method would fail if the sub-estimator didn't support any metadata at all. So we test it here.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this informative PR!

I leveraged some of these tests in #28432, which is good since the tests are almost 100% re-usable and test the Bagging* classes similarly.

@glemaitre glemaitre self-requested a review February 22, 2024 15:48
Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will apply the nitpicks and make the auto-merge

@glemaitre glemaitre enabled auto-merge (squash) February 22, 2024 16:46
@glemaitre glemaitre merged commit 77a63e7 into scikit-learn:main Feb 22, 2024
@StefanieSenger StefanieSenger deleted the routing_VotingClassifier branch February 22, 2024 17:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ensemble module:utils Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants