FIX bagging with metadata routing and estimator implement len #28734

glemaitre · 2024-03-31T18:54:50Z

I could catch a regression in imbalanced-learn when estimator in Bagging* implement __len__ (e.g. RandomForest*) where _get_estimator will trigger calling __len__ with the current pattern.

The problem is that __len__ relies on fitted attribute while _get_estimator is called before fit.

This fix is check for None instead to know when to create a default estimator.

ping @adrinjalali @OmarManzoor @adam2392 since it was introduced in #28432

No changelog needed since we did not yet release this bug ;)

…nt __len__

github-actions · 2024-03-31T18:56:05Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: f5252d5. Link to the linter CI: here}

glemaitre · 2024-03-31T19:05:01Z

I got another regression for which I wrote the test. I have to check what is the reason for raising an error.

glemaitre · 2024-03-31T19:09:31Z

I got another regression for which I wrote the test. I have to check what is the reason for raising an error.

It looks like we did not have a safeguard on using the routing API in this case.

sklearn/ensemble/tests/test_bagging.py

adam2392

Thanks for the fix on the regression!

Wasn't 100% sure I follow your PR description. Sorry :p.

Do you mind describing what the issue was that occurred? Is it that __len__ errors out?

Co-authored-by: Adam Li <adam2392@gmail.com>

glemaitre · 2024-04-01T09:06:11Z

Do you mind describing what the issue was that occurred? Is it that len errors out?

When calling self.estimator or DecisionTreeClassifier(), the or operator trigger to call self.estimator.__len__ for some reason. I should check the stack of the call to understand what the reason for it but for sure, the new pattern is exactly what we want.

glemaitre · 2024-04-01T09:09:52Z

I should check the stack of the call to understand what the reason for it but for sure, the new pattern is exactly what we want.

Thinking about it, I assume that the first thing that Python do is to check if the object is None or "empty" container and thus it should trigger the len call.

adrinjalali

Otherwise LGTM.

adrinjalali · 2024-04-02T04:41:51Z

sklearn/ensemble/_bagging.py

-        return self.estimator or DecisionTreeClassifier()
+        if self.estimator is None:
+            return DecisionTreeClassifier()
+        return self.estimator


why the diff here? I find the existing code clear and shorter

Because it fails when enabling metadata routing with a non-default estimator.
See this #28734 (comment)

adrinjalali · 2024-04-02T04:43:04Z

sklearn/ensemble/_bagging.py

-        return self.estimator or DecisionTreeRegressor()
+        if self.estimator is None:
+            return DecisionTreeRegressor()
+        return self.estimator


adam2392

Makes sense to me. Thanks for catching this!

adam2392 · 2024-04-02T13:43:57Z

The CI failures do not look related to this.

thomasjpfan

I feel like BaseEnsemble.__len__ should return 0 when it is not fitted. This way Python would consider it falsey.

glemaitre · 2024-04-02T21:54:29Z

I feel like BaseEnsemble.len should return 0 when it is not fitted. This way Python would consider it falsey.

I'm fine implementing this behaviour.

However, I'm thinking that this is orthogonal because implementing the proposed behaviour and using if self.estimator or DecisionTreeClassifier() will return DecisionTreeClassifier() while we would expect getting self.estimator (actually it would be a silent bug).

@thomasjpfan did I miss something in your proposal.

thomasjpfan · 2024-04-03T02:15:48Z

However, I'm thinking that this is orthogonal because implementing the proposed behaviour and using if self.estimator or DecisionTreeClassifier() will return DecisionTreeClassifier() while we would expect getting self.estimator (actually it would be a silent bug).

Ah yes, you are correct. We still need this PR.

adrinjalali · 2024-04-03T09:32:31Z

I'm not sure if we should return 0 on an unfitted estimator though. I think __len__ is undefined when the estimator is not fitted, so I might prefer to raise UnfittedError in that case.

WDYT @thomasjpfan

thomasjpfan

LGTM

thomasjpfan · 2024-04-03T13:54:39Z

I'm not sure if we should return 0 on an unfitted estimator though. I think len is undefined when the estimator is not fitted, so I might prefer to raise UnfittedError in that case.

That is okay with me to.

FIX regression in bagging with metadata routing and estimator impleme…

738f236

…nt __len__

github-actions bot added the module:ensemble label Mar 31, 2024

another regression

d11176b

only use routing API when routing enabled

baa0555

glemaitre added the No Changelog Needed label Mar 31, 2024

refactor

b9ed2b0

adam2392 reviewed Apr 1, 2024

View reviewed changes

sklearn/ensemble/tests/test_bagging.py Outdated Show resolved Hide resolved

adam2392 reviewed Apr 1, 2024

View reviewed changes

Update sklearn/ensemble/tests/test_bagging.py

4600e9e

Co-authored-by: Adam Li <adam2392@gmail.com>

glemaitre added 2 commits April 1, 2024 11:19

avoid warning

e53b409

Merge remote-tracking branch 'origin/main' into regression_28432

f5252d5

adrinjalali reviewed Apr 2, 2024

View reviewed changes

adam2392 approved these changes Apr 2, 2024

View reviewed changes

adrinjalali approved these changes Apr 2, 2024

View reviewed changes

adam2392 mentioned this pull request Apr 2, 2024

FEA SLEP006: Metadata routing for SelfTrainingClassifier #28494

Merged

thomasjpfan reviewed Apr 2, 2024

View reviewed changes

thomasjpfan approved these changes Apr 3, 2024

View reviewed changes

thomasjpfan merged commit cfd8091 into scikit-learn:main Apr 3, 2024

Uh oh!

FIX bagging with metadata routing and estimator implement __len__ #28734

FIX bagging with metadata routing and estimator implement __len__ #28734

Uh oh!

Conversation

glemaitre commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

glemaitre commented Mar 31, 2024

Uh oh!

glemaitre commented Mar 31, 2024

Uh oh!

Uh oh!

adam2392 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 1, 2024

Uh oh!

glemaitre commented Apr 1, 2024

Uh oh!

adrinjalali left a comment

Choose a reason for hiding this comment

Uh oh!

adrinjalali Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

glemaitre Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

adrinjalali Apr 2, 2024

Choose a reason for hiding this comment

Uh oh!

adam2392 left a comment

Choose a reason for hiding this comment

Uh oh!

adam2392 commented Apr 2, 2024

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 2, 2024

Uh oh!

thomasjpfan commented Apr 3, 2024

Uh oh!

adrinjalali commented Apr 3, 2024

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Apr 3, 2024

Uh oh!

Uh oh!

FIX bagging with metadata routing and estimator implement len #28734

FIX bagging with metadata routing and estimator implement len #28734

glemaitre commented Mar 31, 2024 •

edited

Loading

github-actions bot commented Mar 31, 2024 •

edited

Loading

adam2392 left a comment •

edited

Loading