-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX bagging with metadata routing and estimator implement __len__ #28734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I got another regression for which I wrote the test. I have to check what is the reason for raising an error. |
It looks like we did not have a safeguard on using the routing API in this case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix on the regression!
Wasn't 100% sure I follow your PR description. Sorry :p.
Do you mind describing what the issue was that occurred? Is it that __len__
errors out?
Co-authored-by: Adam Li <adam2392@gmail.com>
When calling |
Thinking about it, I assume that the first thing that Python do is to check if the object is None or "empty" container and thus it should trigger the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Otherwise LGTM.
return self.estimator or DecisionTreeClassifier() | ||
if self.estimator is None: | ||
return DecisionTreeClassifier() | ||
return self.estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the diff here? I find the existing code clear and shorter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it fails when enabling metadata routing with a non-default estimator.
See this #28734 (comment)
return self.estimator or DecisionTreeRegressor() | ||
if self.estimator is None: | ||
return DecisionTreeRegressor() | ||
return self.estimator |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense to me. Thanks for catching this!
The CI failures do not look related to this. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like BaseEnsemble.__len__
should return 0 when it is not fitted. This way Python would consider it falsey.
I'm fine implementing this behaviour. However, I'm thinking that this is orthogonal because implementing the proposed behaviour and using @thomasjpfan did I miss something in your proposal. |
Ah yes, you are correct. We still need this PR. |
I'm not sure if we should return 0 on an unfitted estimator though. I think WDYT @thomasjpfan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
That is okay with me to. |
I could catch a regression in
imbalanced-learn
whenestimator
inBagging*
implement__len__
(e.g.RandomForest*
) where_get_estimator
will trigger calling__len__
with the current pattern.The problem is that
__len__
relies on fitted attribute while_get_estimator
is called beforefit
.This fix is check for
None
instead to know when to create a default estimator.ping @adrinjalali @OmarManzoor @adam2392 since it was introduced in #28432
No changelog needed since we did not yet release this bug ;)