[MRG] Expose an apply method for gradient boosters #5222

jmschrei · 2015-09-07T13:13:45Z

In response to #5209, I have added an apply method for gradient boosters. This returns a matrix of dimensions (n_samples, n_estimators, n_classes), where each index is the terminal leaf which the sample ends up. This mostly wraps the DecisionTree apply method and mirrors the RandomForest one. A simple unit test has been added as well.

cc @ogrisel @pprett @glouppe @arjoly

glouppe · 2015-09-07T14:28:43Z

Could you also change https://github.com/scikit-learn/scikit-learn/blob/master/examples/ensemble/plot_feature_transformation.py#L76 to make use of this new method?

glouppe · 2015-09-07T14:44:23Z

sklearn/ensemble/gradient_boosting.py

+            return the index of the leaf x ends up in.
+        """
+
+        if self.estimators_ is None or len(self.estimators_) == 0:


Maybe some code from https://github.com/jmschrei/scikit-learn/blob/gbt_apply/sklearn/ensemble/gradient_boosting.py#L1068 could be factored out in a _validate_X_predict method?

amueller · 2015-09-07T19:44:50Z

I thought we didn't have this method because the base estimator doesn't need to be a tree?

jmschrei · 2015-09-07T19:57:58Z

This doesn't consider the base estimator, only the successive trees grown off the gradient of the base estimator.

jmschrei · 2015-09-07T20:01:35Z

Ah, I'm sorry, you meant that you can gradient boost other models other than trees. Currently it is hard coded as trees, and I haven't seen any movement towards including arbitrary gradient boosting thus far. Maybe @ogrisel can comment more?

jmschrei · 2015-09-08T09:13:41Z

I accidentally muddled this PR with my other PR. I am closing this one and will open a new one.

ogrisel · 2015-09-08T14:28:05Z

I thought we didn't have this method because the base estimator doesn't need to be a tree?

I don't know if it's useful to gradient boost other models in practice.

I know XGBoost can use linear models as base learner. However I have a hard time understanding why this is not equivalent to fitting a linear model directly to the loss function. Would love a simple explanation for that. Maybe @pprett knows?

kastnerkyle · 2015-09-08T14:36:11Z

I think it is similar to stacked linear models - by training over different
subsets (in this case residuals, etc.) you can get different results than
just 1 linear model. The subset operation (I think) acts as a weird kind of
non-linearity

On Tue, Sep 8, 2015 at 10:28 AM, Olivier Grisel notifications@github.com
wrote:

I thought we didn't have this method because the base estimator doesn't
need to be a tree?

I don't know if it's useful to gradient boost other models in practice.

I know XGBoost can use linear models as base learner. However I have a
hard time understanding why this is not equivalent to fitting a linear
model directly to the loss function. Would love a simple explanation for
that. Maybe @pprett https://github.com/pprett knows?

—
Reply to this email directly or view it on GitHub
#5222 (comment)
.

tqchen · 2015-09-09T03:53:26Z

The linear model in xgboost is exactly same fitting a linear model to the loss function with parallel coordinate descent. It is implemented in the same interface as gradient boosting, because they are connected in nature.

i.e. Addictive linear model together in gradient boosting way is equivalent to coordinated descent for linear model.

ogrisel · 2015-09-10T11:36:38Z

Thanks @tqchen. That sounds different from what we would get by using linear models as base estimators in scikit-learn gb classes though.

tqchen · 2015-09-10T15:55:46Z

Yes, I guess this was because the difference in terms of interface. xgboost's gbm class is a update style interface where choice can be made whether to add estimator, or improve the current estimator based on the statistics. For linear model, this allows inplace modification of the loss function.

jmschrei added 7 commits September 7, 2015 13:58

FIX arbitrary init estimators for gbt supported

a91d5c3

FIX add author name

830cb13

ENH expose an apply method for gradient boosters

c42c7ba

FIX unit test fixed

41ffefe

ENH expose an apply method for gradient boosters

64b9aab

FIX unit test fixed

66e9e4e

FIX rebased

0717f0c

jmschrei mentioned this pull request Sep 7, 2015

Add apply method to GradientBoostingClassifier #5209

Closed

FIX reverted method order

d2cf102

glouppe reviewed Sep 7, 2015
View reviewed changes

jmschrei added 2 commits September 8, 2015 01:14

Merge branch 'gbt_apply' of https://github.com/jmschrei/scikit-learn

768f7cf

ENH add apply method for GradientBoosting

a969a4d

jmschrei force-pushed the gbt_apply branch from 04f7898 to a969a4d Compare September 8, 2015 09:06

jmschrei closed this Sep 8, 2015

jmschrei deleted the gbt_apply branch September 8, 2015 09:08

jmschrei restored the gbt_apply branch September 8, 2015 09:08

jmschrei reopened this Sep 8, 2015

jmschrei closed this Sep 8, 2015

jmschrei mentioned this pull request Sep 8, 2015

[MRG+1] Apply method added to GradientBoosting #5228

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Expose an apply method for gradient boosters #5222

[MRG] Expose an apply method for gradient boosters #5222

jmschrei commented Sep 7, 2015

glouppe commented Sep 7, 2015

glouppe Sep 7, 2015

amueller commented Sep 7, 2015

jmschrei commented Sep 7, 2015

jmschrei commented Sep 7, 2015

jmschrei commented Sep 8, 2015

ogrisel commented Sep 8, 2015

kastnerkyle commented Sep 8, 2015

tqchen commented Sep 9, 2015

ogrisel commented Sep 10, 2015

tqchen commented Sep 10, 2015

[MRG] Expose an apply method for gradient boosters #5222

[MRG] Expose an apply method for gradient boosters #5222

Conversation

jmschrei commented Sep 7, 2015

glouppe commented Sep 7, 2015

glouppe Sep 7, 2015

Choose a reason for hiding this comment

amueller commented Sep 7, 2015

jmschrei commented Sep 7, 2015

jmschrei commented Sep 7, 2015

jmschrei commented Sep 8, 2015

ogrisel commented Sep 8, 2015

kastnerkyle commented Sep 8, 2015

tqchen commented Sep 9, 2015

ogrisel commented Sep 10, 2015

tqchen commented Sep 10, 2015