Skip to content

[MRG+1] Multioutput bagging #4848

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Conversation

arjoly
Copy link
Member

@arjoly arjoly commented Jun 11, 2015

This pull request brings multi-output support (#3449) to the bagging meta estimators.

It's different of #3449 since the implementation to make the averaging is shared for single-output and multi-output data.

I haven't implemented multi-output decision function as no base estimator currently support this.

@arjoly
Copy link
Member Author

arjoly commented Jun 11, 2015

test_common has been hugely improved these days.

@arjoly
Copy link
Member Author

arjoly commented Jul 13, 2015

bump

proba[k] += all_proba[j][k]

for k in range(self.n_outputs_):
proba[k] /= self.n_estimators
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need a loop here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's needed to get a probability. Do you have a suggestion?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

proba /= self.n_estimators

unless proba is not a Numpy array?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a list of the proba for each output.

@glouppe
Copy link
Contributor

glouppe commented Jul 19, 2015

Other than my comment, +1 for merge

@arjoly
Copy link
Member Author

arjoly commented Jul 20, 2015

Thanks for the review @glouppe !

@glouppe glouppe changed the title [MRG] Multioutput bagging [MRG+1] Multioutput bagging Jul 20, 2015
@arjoly
Copy link
Member Author

arjoly commented Jul 22, 2015

appveyor is working good!

@ogrisel
Copy link
Member

ogrisel commented Jul 24, 2015

Can you please add a new entry in whats_new.rst?

proba_est = estimator.predict_proba(X[:, features])
if n_outputs == 1:
proba_est = [proba_est]
except AttributeError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use hasattr as previously?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can switch to hasattr.

@arjoly
Copy link
Member Author

arjoly commented Jul 24, 2015

@ogrisel I have taken into account your comments.

@arjoly
Copy link
Member Author

arjoly commented Jul 29, 2015

bump fix merge conflict

@amueller
Copy link
Member

just out of curiosity: what is your application for multi-output?

@amueller
Copy link
Member

I am still pretty torn on multi-output stuff. It adds so much complexity, breaks the API for predict_proba and I have not really seen it used anywhere....

@glouppe
Copy link
Contributor

glouppe commented Jul 30, 2015

I am still pretty torn on multi-output stuff. It adds so much complexity, breaks the API for predict_proba and I have not really seen it used anywhere....

I admit I have the same issue. I dont know if I would implement multi-output in RFs if I had to redo it from scratch...

@arjoly
Copy link
Member Author

arjoly commented Jul 31, 2015

I am doing a lot of multi-label learning and multi-output regression. Using trees or variations as base learners, I am stuck with the current tree api and also often face this issue #2451.

@amueller
Copy link
Member

so downgrading trees to multi-label would help you?

@GaelVaroquaux
Copy link
Member

I think that we will see more and more usage of multi-label in the
future: it tends to map better to actual real-world problems.

Not to say that I have an opinion on this actual PR: I haven't looked at
it in detals.

@amueller
Copy link
Member

@GaelVaroquaux I am totally for multi-label, the question is more multi-output multi-class, which doesn't seem to be a common setting.

@GaelVaroquaux
Copy link
Member

GaelVaroquaux commented Jul 31, 2015 via email

@arjoly
Copy link
Member Author

arjoly commented Jul 31, 2015

so downgrading trees to multi-label would help you?

Given that it's an interface problem and that I am already working around it, it wouldn't help me.

@arjoly
Copy link
Member Author

arjoly commented Aug 13, 2015

Maybe instead of having a list of array for each proba. We should have a tri-dimensional array of shape (n_samples, n_classes, n_outputs). If the number of classes is not identical, we could set the probability to zero. At least, we would get fast numpy operations and preserve the whole feature.

For decision_function, I don't how the dummy classes should be set.

@jnothman jnothman modified the milestone: 0.19 Jun 18, 2017
@amueller amueller added the Superseded PR has been replace by a newer PR label Aug 6, 2019
Base automatically changed from master to main January 22, 2021 10:48
@cmarmo
Copy link
Contributor

cmarmo commented May 2, 2022

Closing as superseded by #8547.

@cmarmo cmarmo closed this May 2, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:ensemble Superseded PR has been replace by a newer PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants