-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Multioutput bagging #4848
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
bump |
proba[k] += all_proba[j][k] | ||
|
||
for k in range(self.n_outputs_): | ||
proba[k] /= self.n_estimators |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we actually need a loop here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's needed to get a probability. Do you have a suggestion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
proba /= self.n_estimators
unless proba
is not a Numpy array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's a list of the proba for each output.
Other than my comment, +1 for merge |
Thanks for the review @glouppe ! |
appveyor is working good! |
Can you please add a new entry in |
sklearn/ensemble/bagging.py
Outdated
proba_est = estimator.predict_proba(X[:, features]) | ||
if n_outputs == 1: | ||
proba_est = [proba_est] | ||
except AttributeError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not use hasattr
as previously?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can switch to hasattr
.
@ogrisel I have taken into account your comments. |
bump fix merge conflict |
just out of curiosity: what is your application for multi-output? |
I am still pretty torn on multi-output stuff. It adds so much complexity, breaks the API for |
I admit I have the same issue. I dont know if I would implement multi-output in RFs if I had to redo it from scratch... |
I am doing a lot of multi-label learning and multi-output regression. Using trees or variations as base learners, I am stuck with the current tree api and also often face this issue #2451. |
so downgrading trees to multi-label would help you? |
I think that we will see more and more usage of multi-label in the Not to say that I have an opinion on this actual PR: I haven't looked at |
@GaelVaroquaux I am totally for multi-label, the question is more multi-output multi-class, which doesn't seem to be a common setting. |
@GaelVaroquaux I am totally for multi-label, the question is more multi-output
multi-class, which doesn't seem to be a common setting.
I am with you, completely. My remark was more in the sense of
multi-output trees. But we should wait a bit: Jacob is experimenting with
some rewrites of the tree modules that seem to be giving a factor 2 speed
up in RFs.
|
Given that it's an interface problem and that I am already working around it, it wouldn't help me. |
Maybe instead of having a list of array for each proba. We should have a tri-dimensional array of shape (n_samples, n_classes, n_outputs). If the number of classes is not identical, we could set the probability to zero. At least, we would get fast numpy operations and preserve the whole feature. For |
Closing as superseded by #8547. |
This pull request brings multi-output support (#3449) to the bagging meta estimators.
It's different of #3449 since the implementation to make the averaging is shared for single-output and multi-output data.
I haven't implemented multi-output decision function as no base estimator currently support this.