Skip to content

[MRG] Expose an apply method for gradient boosters #5222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 10 commits into from

Conversation

jmschrei
Copy link
Member

@jmschrei jmschrei commented Sep 7, 2015

In response to #5209, I have added an apply method for gradient boosters. This returns a matrix of dimensions (n_samples, n_estimators, n_classes), where each index is the terminal leaf which the sample ends up. This mostly wraps the DecisionTree apply method and mirrors the RandomForest one. A simple unit test has been added as well.

cc @ogrisel @pprett @glouppe @arjoly

@glouppe
Copy link
Contributor

glouppe commented Sep 7, 2015

return the index of the leaf x ends up in.
"""

if self.estimators_ is None or len(self.estimators_) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some code from https://github.com/jmschrei/scikit-learn/blob/gbt_apply/sklearn/ensemble/gradient_boosting.py#L1068 could be factored out in a _validate_X_predict method?

@amueller
Copy link
Member

amueller commented Sep 7, 2015

I thought we didn't have this method because the base estimator doesn't need to be a tree?

@jmschrei
Copy link
Member Author

jmschrei commented Sep 7, 2015

This doesn't consider the base estimator, only the successive trees grown off the gradient of the base estimator.

@jmschrei
Copy link
Member Author

jmschrei commented Sep 7, 2015

Ah, I'm sorry, you meant that you can gradient boost other models other than trees. Currently it is hard coded as trees, and I haven't seen any movement towards including arbitrary gradient boosting thus far. Maybe @ogrisel can comment more?

@jmschrei jmschrei closed this Sep 8, 2015
@jmschrei jmschrei deleted the gbt_apply branch September 8, 2015 09:08
@jmschrei jmschrei restored the gbt_apply branch September 8, 2015 09:08
@jmschrei jmschrei reopened this Sep 8, 2015
@jmschrei
Copy link
Member Author

jmschrei commented Sep 8, 2015

I accidentally muddled this PR with my other PR. I am closing this one and will open a new one.

@ogrisel
Copy link
Member

ogrisel commented Sep 8, 2015

I thought we didn't have this method because the base estimator doesn't need to be a tree?

I don't know if it's useful to gradient boost other models in practice.

I know XGBoost can use linear models as base learner. However I have a hard time understanding why this is not equivalent to fitting a linear model directly to the loss function. Would love a simple explanation for that. Maybe @pprett knows?

@kastnerkyle
Copy link
Member

I think it is similar to stacked linear models - by training over different
subsets (in this case residuals, etc.) you can get different results than
just 1 linear model. The subset operation (I think) acts as a weird kind of
non-linearity

On Tue, Sep 8, 2015 at 10:28 AM, Olivier Grisel notifications@github.com
wrote:

I thought we didn't have this method because the base estimator doesn't
need to be a tree?

I don't know if it's useful to gradient boost other models in practice.

I know XGBoost can use linear models as base learner. However I have a
hard time understanding why this is not equivalent to fitting a linear
model directly to the loss function. Would love a simple explanation for
that. Maybe @pprett https://github.com/pprett knows?


Reply to this email directly or view it on GitHub
#5222 (comment)
.

@tqchen
Copy link

tqchen commented Sep 9, 2015

The linear model in xgboost is exactly same fitting a linear model to the loss function with parallel coordinate descent. It is implemented in the same interface as gradient boosting, because they are connected in nature.

i.e. Addictive linear model together in gradient boosting way is equivalent to coordinated descent for linear model.

@ogrisel
Copy link
Member

ogrisel commented Sep 10, 2015

Thanks @tqchen. That sounds different from what we would get by using linear models as base estimators in scikit-learn gb classes though.

@tqchen
Copy link

tqchen commented Sep 10, 2015

Yes, I guess this was because the difference in terms of interface. xgboost's gbm class is a update style interface where choice can be made whether to add estimator, or improve the current estimator based on the statistics. For linear model, this allows inplace modification of the loss function.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants