Skip to content

[FEA] Add metadata routing to BaggingClassifier and BaggingRegressor #28432

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
Feb 21, 2024

Conversation

adam2392
Copy link
Member

@adam2392 adam2392 commented Feb 15, 2024

Reference Issues/PRs

Towards: #22893

What does this implement/fix? Explain your changes.

Adds metadata routing to BaggingClassifier and BaggingRegressor. The unit tests are inspired from #27584 where the metadata routing is not only checked on the meta classifier/regressor, but also on each estimator in estimator.estimators_

Any other comments?

A continuation of #24250

cc: @adrinjalali

Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link

github-actions bot commented Feb 15, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: debe632. Link to the linter CI: here

Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 marked this pull request as ready for review February 15, 2024 21:49
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392 adam2392 changed the title [ENH] Add metadata routing to BaggingClassifier and BaggingRegressor [FEA] Add metadata routing to BaggingClassifier and BaggingRegressor Feb 15, 2024
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Member Author

Some of the CI checks were skipped. Unsure if that is due to me

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adrinjalali adrinjalali self-requested a review February 18, 2024 18:07
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice PR, thanks @adam2392 !

Comment on lines 136 to 138
# Row sampling can be achieved either through setting sample_weight or
# by indexing. The former is more efficient. Therefore, use this method
# if possible, otherwise use indexing.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this comment is related to this section, is it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is for the two methods of sub-sampling (one uses bootstrap sample weights). The other uses indices. I moved the comment towards the if/else block.

Comment on lines 948 to 949
# Metadata Routing Tests
# ======================
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't Bagging be a part of test_metaestimator_metadata_routing.py set?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I added it to that section, but one major issue is that Bagging* can have different sample_weight used in its sub-estimators due to the fact that it is used via sub-sampling the sample indices. As a result, the following check fails:

            assert registry
            if preserves_metadata is True:
                for estimator in registry:
                    check_recorded_metadata(estimator, method_name, **method_kwargs)

How do you suggest handling this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I guess Actually, I guess instead I can make the preserves_metadata=False, and the tests will pass: e2dbc63

Let me know if this is fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's either preserve_metadata=False or preserve_metadata="subset"

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I set that to false inside the meta estimator tests. I can remove these unit tests then in test_bagging.py to reduce the diff.

adam2392 and others added 2 commits February 19, 2024 10:10
Co-authored-by: Adrin Jalali <adrin.jalali@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
@adam2392
Copy link
Member Author

Nice PR, thanks @adam2392 !

Thanks for the review! I think I addressed most of your comments and then left a few questions for follow-up

@adam2392 adam2392 requested a review from adrinjalali February 19, 2024 16:50
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. thanks.

@OmarManzoor maybe you could have a look?

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @adam2392. Just a few minor comments otherwise LGTM.

adam2392 and others added 4 commits February 21, 2024 09:05
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
Co-authored-by: Omar Salman <omar.salman@arbisoft.com>
@OmarManzoor OmarManzoor enabled auto-merge (squash) February 21, 2024 15:05
@OmarManzoor OmarManzoor merged commit 8c99d92 into scikit-learn:main Feb 21, 2024
@adam2392 adam2392 deleted the baggingmeta branch February 21, 2024 15:14
@adam2392
Copy link
Member Author

Thanks @OmarManzoor for the quick review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants