-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[FEA] Add metadata routing to BaggingClassifier
and BaggingRegressor
#28432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
BaggingClassifier
and BaggingRegressor
BaggingClassifier
and BaggingRegressor
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
…into baggingmeta
Some of the CI checks were skipped. Unsure if that is due to me |
…into baggingmeta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice PR, thanks @adam2392 !
sklearn/ensemble/_bagging.py
Outdated
# Row sampling can be achieved either through setting sample_weight or | ||
# by indexing. The former is more efficient. Therefore, use this method | ||
# if possible, otherwise use indexing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this comment is related to this section, is it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is for the two methods of sub-sampling (one uses bootstrap sample weights). The other uses indices. I moved the comment towards the if/else block.
# Metadata Routing Tests | ||
# ====================== |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't Bagging
be a part of test_metaestimator_metadata_routing.py
set?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I added it to that section, but one major issue is that Bagging* can have different sample_weight
used in its sub-estimators due to the fact that it is used via sub-sampling the sample indices. As a result, the following check fails:
assert registry
if preserves_metadata is True:
for estimator in registry:
check_recorded_metadata(estimator, method_name, **method_kwargs)
How do you suggest handling this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For reference, here is the error that gets raised: https://dev.azure.com/scikit-learn/scikit-learn/_build/results?buildId=64160&view=logs&j=f71949a9-f9d9-549e-cf45-2e99c7b412d1&t=d5baef2b-8bda-5a4c-e848-157b5caff279&l=1071
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I guess Actually, I guess instead I can make the preserves_metadata=False
, and the tests will pass: e2dbc63
Let me know if this is fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's either preserve_metadata=False
or preserve_metadata="subset"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I set that to false inside the meta estimator tests. I can remove these unit tests then in test_bagging.py to reduce the diff.
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Thanks for the review! I think I addressed most of your comments and then left a few questions for follow-up |
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. thanks.
@OmarManzoor maybe you could have a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @adam2392. Just a few minor comments otherwise LGTM.
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Thanks @OmarManzoor for the quick review :) |
Reference Issues/PRs
Towards: #22893
What does this implement/fix? Explain your changes.
Adds metadata routing to
BaggingClassifier
andBaggingRegressor
. The unit tests are inspired from #27584 where the metadata routing is not only checked on the meta classifier/regressor, but also on each estimator inestimator.estimators_
Any other comments?
A continuation of #24250
cc: @adrinjalali