Skip to content

AverageRegressor? #10743

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Mar 2, 2018 · 11 comments · Fixed by #12513
Closed

AverageRegressor? #10743

amueller opened this issue Mar 2, 2018 · 11 comments · Fixed by #12513

Comments

@amueller
Copy link
Member

amueller commented Mar 2, 2018

Should we add the regressor equivalent of VotingClassifier, which would just be computing averages? (I vaguely remember seeing that somewhere but now couldn't find issue or PR)

@mohamed-ali
Copy link
Contributor

@amueller, I'd like to work on this issue, if no other PR is available.

@mohamed-ali
Copy link
Contributor

mohamed-ali commented Mar 2, 2018

@amueller I think the most similar to this new AverageRegressor is the baggingRegressor. (http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.BaggingRegressor.html)

The baggingRegressor fits the base regressors on a random subset of the data while using the same base_estimator :

A Bagging regressor is an ensemble meta-estimator that fits base regressors each on random subsets of the original dataset and then aggregate their individual predictions (either by voting or by averaging) to form a final prediction.

The new AverageRegressor is a bit different, in that it takes a list of estimators and fits each of them on the whole training set (instead of a random subset). Other than that, I guess both are similar in principle.

@jnothman
Copy link
Member

jnothman commented Mar 2, 2018 via email

@mohamed-ali
Copy link
Contributor

mohamed-ali commented Mar 2, 2018

@jnothman, I see Averaging frequently in Kaggle competitions as one of the ensembling techniques, however stacking has proven to be more valuable than simple averaging. I think we can consider averaging a corner case of stacking where instead of using a new estimator to aggregate the predictions of previous estimators we use a simple function sum(all_y_hat)/n_estimators.

I guess the argument for adding AverageRegressor as a separate regressor is to keep consistency with other API as VotingClassifier and, also, to cover the most famous ensembling techniques that are currently used.

@mohamed-ali
Copy link
Contributor

As far as I know, there are four categories of ensembling techniques. For each of them, sklearn implements the following:

  • Voting/Averaging: VotingClassifier for voting but none for regression (averaging).
  • Bagging: BaggingRegressor, BaggingClassifier
  • Stacking (or blending): the aformentioned pull request [MRG+1] Stacking classifier with pipelines API #8960 will introduce it.
  • Boosting: GradientBoostingClassifier, GradientBoostingRegressor

So, I think that it's relevant to implement the AverageRegressor for the sake of completeness.

@agramfort
Copy link
Member

agramfort commented Mar 4, 2018 via email

@mohamed-ali
Copy link
Contributor

mohamed-ali commented Mar 5, 2018

Can I start working on a PR for this, or should I wait until consensus is reached?

@jnothman
Copy link
Member

jnothman commented Mar 5, 2018 via email

@mohamed-ali
Copy link
Contributor

mohamed-ali commented Mar 25, 2018

@amueller @jnothman @agramfort, I understand that the decision hasn't been made yet, but I thought it might be useful to have a concrete PR in case we decide in favor of including ensemble.AverageRegressor.

The work has been pushed here: #10868.

I am looking forward to see your reviews.

@stsouko
Copy link

stsouko commented Nov 3, 2018

I forked #10868.
@mohamed-ali's code refactored.
todo: user guide. will be soon.

@stsouko
Copy link

stsouko commented Nov 19, 2018

Code is ready to merge.
@amueller, can you review pr #12513?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants