-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] ENH: multi-output support for BaggingRegressor #8547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
f8f4c27
to
a178084
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is nice... Also could you add a whatsnew...
boston.target, | ||
random_state=rng) | ||
|
||
reg1 = BaggingRegressor(base_estimator=Ridge(), n_estimators=1, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you test for n_estimators>1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya you're absolutely right. n_estimators=1
makes no sense within a test called "bagging" :). Fixed.
sklearn/ensemble/bagging.py
Outdated
@@ -281,7 +283,7 @@ def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=None): | |||
random_state = check_random_state(self.random_state) | |||
|
|||
# Convert data | |||
X, y = check_X_y(X, y, ['csr', 'csc']) | |||
X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=self.multi_output) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why we can't just use multi_output=True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If multioutput is false, we don't want y to be having more than one columns no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ya, @johannah I don't see why you you'd rather not have the multi_output=True
in that func call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After second thoughts, i think @jnothman has a point. We can make BaggingRegressor
support multi-output targets by simply
- Forcing
multi_output=True
in the callcheck_X_y(...)
- Skipping the
_validate_y(...)
thing. I guess the question should we really be skipping this ?
check_X_y is clearly documented to say that this parameter controls whether
multioutput is allowed, not required. 1d y will remain 1d. The only
difference might be for column vectors
On 8 Mar 2017 7:24 am, "(Venkat) Raghav (Rajagopalan)" < notifications@github.com> wrote:
*@raghavrv* commented on this pull request.
------------------------------
In sklearn/ensemble/bagging.py
<#8547 (comment)>
:
@@ -281,7 +283,7 @@ def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=None):
random_state = check_random_state(self.random_state)
# Convert data
- X, y = check_X_y(X, y, ['csr', 'csc'])
+ X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=self.multi_output)
If multioutput is false, we don't want y to be having more than one columns
no?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#8547 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6zsBd0RH2bQeg9LUXXoW6aU-S2V3ks5rjbzvgaJpZM4MUyGz>
.
|
82973cc
to
13ef011
Compare
86e1d16
to
2bbf3ab
Compare
I don't think we need a classifier parameter to make this work, do we? |
@johannah: I don't get your question, please could you be more precise. |
8e68e4e
to
168eaf3
Compare
we can skip out change whatever is making inappropriate constraints
…On 9 Jun 2017 1:48 am, "DOHMATOB Elvis" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/ensemble/bagging.py
<#8547 (comment)>
:
> @@ -281,7 +283,7 @@ def _fit(self, X, y, max_samples=None, max_depth=None, sample_weight=None):
random_state = check_random_state(self.random_state)
# Convert data
- X, y = check_X_y(X, y, ['csr', 'csc'])
+ X, y = check_X_y(X, y, ['csr', 'csc'], multi_output=self.multi_output)
After second thoughs, i think @jnothman <https://github.com/jnothman> has
a point. We can make BaggingRegressor support multi-output targets by
simply
- Forcing multi_output=True in the call check_X_y(...)
- Skipping the _validate_y(...) thing. I guess the question should we
really be skipping this ?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8547 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67SK23dZm6bdKupvWNWG2XRSQbqIks5sCBfhgaJpZM4MUyGz>
.
|
e3acfa4
to
7b9b39f
Compare
@@ -843,6 +843,9 @@ Trees and ensembles | |||
:class:`ensemble.VotingClassifier` to fit underlying estimators in parallel. | |||
:issue:`5805` by :user:`Ibraim Ganiev <olologin>`. | |||
|
|||
- :class:`ensemble.BaggingRegressor` now supports multi-output targets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to move to the v0.19 section.
@@ -390,8 +390,9 @@ def _set_oob_score(self, X, y): | |||
"""Calculate out of bag predictions and score.""" | |||
|
|||
def _validate_y(self, y): | |||
# Default implementation | |||
return column_or_1d(y, warn=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there should still be a check_array
call here. And if not, we should delete this method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this might be right, since the validation is done by the downstream estimator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've had several contributions of multi-output support for bagging. I think keeping it minimal here (not trying to do classification at the same time) is a good idea.
@@ -390,8 +390,9 @@ def _set_oob_score(self, X, y): | |||
"""Calculate out of bag predictions and score.""" | |||
|
|||
def _validate_y(self, y): | |||
# Default implementation | |||
return column_or_1d(y, warn=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, this might be right, since the validation is done by the downstream estimator.
reg1.fit(X_train, y_train) | ||
for n_targets in [1, 2]: | ||
y_train_ = np.ndarray((len(y_train), n_targets)) | ||
y_train_.T[:] = y_train.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this hard to understand.
7b9b39f
to
893eab9
Compare
What's the status on this? |
I'd also really like to see this implemented. |
Hi! |
Closing as implemented in #9707 |
Closes #8993, #3449, #4848. This minimalistic patch adds multi-output support to BaggingRegressor.