Skip to content

[MRG+1] [BUG] AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights #5207

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 9, 2015

Conversation

MechCoder
Copy link
Member

AdaBoostRegressor should work without sample_weights in the base estimator

The random weighted sampling is done internally in the _boost method in AdaBoostRegressor.

@MechCoder
Copy link
Member Author

@glouppe @arjoly Could you please verify if this is a bug or not? I do not know the internals of boosting to comment on whether the sample_weights provided to the fit in the meta-estimator is the same as that passed to fit in the base estimators. But by how it is implemented in AdaBoostClassifier, it seems like that.

@vstolbunov
Copy link
Contributor

Interesting. I am encountering a failed test in this PR: #5204 from within test_sample_weight_missing() of ensemble/tests/test_weight_boosting.py because it uses an AdaBoostRegressor with LogisticRegression and not raising the ValueError. Connected?

@MechCoder
Copy link
Member Author

Yes :) It might be better to keep the changes independent though, i.e wait till this PR is merged and rebase the sample_weights PR over this.

@vstolbunov
Copy link
Contributor

What is test_sample_weight_missing meant to check for? I think the parts of that test which use LogisticRegression can be removed in #5204 since sample_weight would now be a parameter?

@MechCoder
Copy link
Member Author

I think the idea is to raise an error or not support base estimators which do not have sample_weights. However, I suppose an error should be raised only when sample_weights is not None for these cases?

Anyhow, the checks for LogisticRegression can be removed in the other PR or the solver can be explicitly set to liblinear since that should also raise a ValueError (although in a different part of the code)

@MechCoder
Copy link
Member Author

Actually this is not a bug. The weighting of samples is taken care by the block of code above which does a weighted random sampling of n_samples, hence we need not pass the sample_weights.

AdaBoostRegressor should work without the base estimator needing sample_weights.

@MechCoder MechCoder force-pushed the ensemble_sample_weight_bug branch 2 times, most recently from 9bd22a5 to b86b846 Compare September 3, 2015 19:04
@MechCoder MechCoder changed the title AdaBoostRegressor does not pass sample_weights to the base estimator AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights Sep 3, 2015
@MechCoder
Copy link
Member Author

@vstolbunov I've updated the pull request. Let me know what you think.

@vstolbunov
Copy link
Contributor

Good point. Looks good to me.

@MechCoder MechCoder changed the title AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights [MRG] [BUG] AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights Sep 3, 2015
@glouppe
Copy link
Contributor

glouppe commented Sep 4, 2015

I dont understand this PR. AdaBoostRegressor needs the base estimator to use the sample weights that are computed at each iteration (see https://github.com/MechCoder/scikit-learn/blob/ensemble_sample_weight_bug/sklearn/ensemble/weight_boosting.py#L1053 for regression). Therefore the base estimator must support sample weights.

@MechCoder
Copy link
Member Author

I am confused, but in the code over the base estimator does not require these weights to be passed to the fit method over here (https://github.com/MechCoder/scikit-learn/blob/ensemble_sample_weight_bug/sklearn/ensemble/weight_boosting.py#L1020) and the handling of sample_weights is externally done from L1011 TO L1016 right?. Where else is it passed to the base estimator?

@glouppe
Copy link
Contributor

glouppe commented Sep 4, 2015

Indeed, that is strange. I think this was done on purpose but I am not sure anymore. @ndawe do you remember?

@MechCoder
Copy link
Member Author

Are you okay with removing the sample_weights check for now (as done in this PR)? and opening an issue that discusses this for the future?

@ndawe
Copy link
Member

ndawe commented Sep 6, 2015

This check for sample_weight support was not in the original implementation that I wrote (as I recall). Git blame shows @akshayah3. For regression, sample_weight support is not required in the base estimator since AdaBoost.R2 only uses the weights in the bootstrapping. So this check should be removed.

@MechCoder MechCoder force-pushed the ensemble_sample_weight_bug branch from b86b846 to 89f4c8b Compare September 6, 2015 12:51
@MechCoder
Copy link
Member Author

@ndawe Thanks! Would you be able to do a quick code review? This is blocking another LogisticRegression pr

@ndawe
Copy link
Member

ndawe commented Sep 6, 2015

+1

Just had a quick look and this looks fine to me.

@ndawe ndawe changed the title [MRG] [BUG] AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights [MRG+1] [BUG] AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights Sep 6, 2015
"""
class DummyEstimator(BaseEstimator):

def fit(self, X, y, sample_weight=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sample_weight=None isn't needed here, right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, that was the point of the test in the first place :/

@MechCoder MechCoder force-pushed the ensemble_sample_weight_bug branch from 89f4c8b to bb2aced Compare September 7, 2015 03:31
@MechCoder
Copy link
Member Author

Any second reviewers.
ping @glouppe @arjoly

@vstolbunov
Copy link
Contributor

Hope this gets passed soon so we can have both #5207 and #5204 in the next release! :)

@amueller amueller added this to the 0.17 milestone Sep 8, 2015
@amueller
Copy link
Member

amueller commented Sep 8, 2015

ping @mblondel from #3711.

@ndawe
Copy link
Member

ndawe commented Sep 9, 2015

I don't see why we shouldn't just merge this right now :)

@MechCoder
Copy link
Member Author

I'll leave the bold step of doing this to others :)

@glouppe
Copy link
Contributor

glouppe commented Sep 9, 2015

Sorry for the late review. I just checked the paper, this is indeed the way it is supposed to be for AdaBoost.R2.

glouppe added a commit that referenced this pull request Sep 9, 2015
[MRG+1] [BUG] AdaBoostRegressor should not raise errors if the base_estimator does not support sample_weights
@glouppe glouppe merged commit 7fccf45 into scikit-learn:master Sep 9, 2015
@MechCoder MechCoder deleted the ensemble_sample_weight_bug branch September 9, 2015 14:27
vstolbunov added a commit to vstolbunov/scikit-learn that referenced this pull request Sep 9, 2015
@amueller
Copy link
Member

amueller commented Sep 9, 2015

So was #3711 just a mistake?

@glouppe
Copy link
Contributor

glouppe commented Sep 9, 2015

Yes, it shouldn't have been added for the regressor. (But this was correct for the classifier.)

@ndawe
Copy link
Member

ndawe commented Sep 10, 2015

Thanks @MechCoder !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants