[MRG+1] Apply method added to GradientBoosting #5228

jmschrei · 2015-09-08T10:15:45Z

Fixed the issues in #5222. cc @glouppe @ogrisel @arjoly @amueller

Apologies about convoluting the PR.

jmschrei · 2015-09-08T10:39:45Z

Example updated as requested by @glouppe

glouppe · 2015-09-08T11:04:58Z

sklearn/ensemble/gradient_boosting.py

+
+        if self.estimators_ is None or len(self.estimators_) == 0:
+            raise NotFittedError("Estimator not fitted, "
+                                 "call `fit` before exploiting the model.")


I know this is a bit out of scope wrt this PR, but I think factoring out this check and what is done at https://github.com/jmschrei/scikit-learn/blob/gb_apply/sklearn/ensemble/gradient_boosting.py#L1068 would be a nice thing to do.

jmschrei · 2015-09-08T13:01:55Z

Comments have been incorporated, thanks!

glouppe · 2015-09-08T13:12:33Z

Great! Thanks for the quick changes. +1 on my side

glouppe · 2015-09-09T06:50:03Z

Ping @arjoly @pprett @betatim

betatim · 2015-09-09T07:21:27Z

I am happy with the change to the example. Nice work.

Only nitpick is that the shape of the returned array is different from the one returned by RandomForestClassifier and friends, but I can't think of a way to fix that.

arjoly · 2015-09-09T08:07:58Z

Can you add a test in regression and a test in multi-class classification?

arjoly · 2015-09-09T08:09:32Z

examples/ensemble/plot_feature_transformation.py

-grd_enc.fit(gradient_apply(grd, X_train))
-grd_lm.fit(grd_enc.transform(gradient_apply(grd, X_train_lr)), y_train_lr)
+grd_enc.fit(grd.apply(X_train)[:,:,0])
+grd_lm.fit(grd_enc.transform(grd.apply(X_train_lr)[:,:,0]), y_train_lr)


nitpick: Does flake8 comply with [:,:,0]?

I don't know what flake8 is. @ogrisel ?

It is a command line tool to check PEP8 and PEP257 style guidelines (which we try to follow).

In particular, here I suppose Arnaud would have expected [: ,: , 0] rather than [:,:,0].

+1 for respecting pep8 and be consistent with the style of the code base. @jmschrei if you use atom editor you can:

1- pip install flake8 to install the flake8 command in your PATH
2- install the linter and linter-flake8 packages in your atom environment to run the checks in your editor

Alternatively you can run:

flake8 sklearn/path/to/module.py

from the command line.

jmschrei · 2015-09-09T09:07:06Z

Unit tests added. I would support keeping the same shape for all gradient boosting models. Gradient boosting shape has to be different than RF in multi-class cases, might as well not make it even more complicated.

ogrisel · 2015-09-09T09:30:02Z

sklearn/ensemble/gradient_boosting.py

+
+        for i in range(n_estimators):
+            for j in range(n_classes):
+                leaves[:, i, j] = self.estimators_[i, j].apply(X)


Please call .apply(X, check_input=False) as the inputs have already been checked previously.

jmschrei · 2015-09-09T11:34:26Z

Comments have been incorporated, GradientBoostingRegressor has its own method which wraps the base call with its own documentation, returning [n_samples, n_estimators]. flake8 has been run on the example and produces no warnings. Commits have been squashed, has been rebased on master (to include #5230), and unit tests all pass.

ogrisel · 2015-09-09T14:02:25Z

sklearn/ensemble/gradient_boosting.py

+        self._check_initialized()
+        X = self.estimators_[0, 0]._validate_X_predict(X, check_input=True)
+
+        n_estimators, n_classes = self.estimators_.shape


Maybe add an inline comment here that says that n_classes is 1 both for binary classification and in a regression context.

n_classes is 0; we changed the shape to [n_samples, n_estimators] in a regression context.

It's got to be 1. Otherwise there would be no leaf data at all.

ogrisel · 2015-09-09T14:18:16Z

Apart from the nitpick, this LGTM as well.

arjoly · 2015-09-09T14:35:31Z

sklearn/ensemble/gradient_boosting.py

+
+        leaves = super(GradientBoostingRegressor, self).apply(X)
+        leaves = leaves.reshape(X.shape[0], self.estimators_.shape[0])  
+        return leaves


A empty line is missing after this one.

arjoly · 2015-09-09T14:36:28Z

+1 also whenever the remaining comments are addressed

jmschrei · 2015-09-10T07:48:56Z

Changes incorporated, all tests pass.

ogrisel · 2015-09-10T08:30:25Z

examples/ensemble/plot_feature_transformation.py

+plt.plot(fpr_grd_lm, tpr_grd_lm, label='GBT + LR')
+plt.xlabel('False positive rate')
+plt.ylabel('True positive rate')
+plt.title('ROC curve')


Maybe change the title to make it explicit that this is a zoom on on the top left corner of the ROC curves.

jmschrei · 2015-09-10T08:33:32Z

@ogrisel fixed

ogrisel · 2015-09-10T08:37:21Z

Thanks, LGTM as well. merging!

[MRG+1] Apply method added to GradientBoosting

jmschrei · 2015-09-10T08:43:28Z

🎺

arjoly · 2015-09-10T08:44:41Z

Great ! :-)

jmschrei changed the title ~~ENH apply method added to GradientBoosting~~ [MRG] Apply method added to GradientBoosting Sep 8, 2015

glouppe reviewed Sep 8, 2015
View reviewed changes

glouppe changed the title ~~[MRG] Apply method added to GradientBoosting~~ [MRG+1] Apply method added to GradientBoosting Sep 8, 2015

arjoly reviewed Sep 9, 2015
View reviewed changes

ogrisel reviewed Sep 9, 2015
View reviewed changes

jmschrei force-pushed the gb_apply branch from 13704d7 to cd8a5c9 Compare September 9, 2015 11:22

ogrisel reviewed Sep 9, 2015
View reviewed changes

arjoly reviewed Sep 9, 2015
View reviewed changes

jmschrei force-pushed the gb_apply branch from cd8a5c9 to 36dbc11 Compare September 10, 2015 07:08

ogrisel reviewed Sep 10, 2015
View reviewed changes

ENH apply method added to Gradient Boosting

bc225a5

jmschrei force-pushed the gb_apply branch from 36dbc11 to bc225a5 Compare September 10, 2015 08:33

ogrisel added a commit that referenced this pull request Sep 10, 2015

Merge pull request #5228 from jmschrei/gb_apply

470b9a4

[MRG+1] Apply method added to GradientBoosting

ogrisel merged commit 470b9a4 into scikit-learn:master Sep 10, 2015

jmschrei mentioned this pull request Sep 10, 2015

Add apply method to GradientBoostingClassifier #5209

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Apply method added to GradientBoosting #5228

[MRG+1] Apply method added to GradientBoosting #5228

jmschrei commented Sep 8, 2015

jmschrei commented Sep 8, 2015

glouppe Sep 8, 2015

jmschrei commented Sep 8, 2015

glouppe commented Sep 8, 2015

glouppe commented Sep 9, 2015

betatim commented Sep 9, 2015

arjoly commented Sep 9, 2015

arjoly Sep 9, 2015

jmschrei Sep 9, 2015

glouppe Sep 9, 2015

ogrisel Sep 9, 2015

jmschrei commented Sep 9, 2015

ogrisel Sep 9, 2015

jmschrei commented Sep 9, 2015

ogrisel Sep 9, 2015

jmschrei Sep 9, 2015

ogrisel Sep 9, 2015

ogrisel commented Sep 9, 2015

arjoly Sep 9, 2015

arjoly commented Sep 9, 2015

jmschrei commented Sep 10, 2015

ogrisel Sep 10, 2015

jmschrei commented Sep 10, 2015

ogrisel commented Sep 10, 2015

jmschrei commented Sep 10, 2015

arjoly commented Sep 10, 2015

[MRG+1] Apply method added to GradientBoosting #5228

[MRG+1] Apply method added to GradientBoosting #5228

Conversation

jmschrei commented Sep 8, 2015

jmschrei commented Sep 8, 2015

Choose a reason for hiding this comment

jmschrei commented Sep 8, 2015

glouppe commented Sep 8, 2015

glouppe commented Sep 9, 2015

betatim commented Sep 9, 2015

arjoly commented Sep 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jmschrei commented Sep 9, 2015

Choose a reason for hiding this comment

jmschrei commented Sep 9, 2015

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel commented Sep 9, 2015

Choose a reason for hiding this comment

arjoly commented Sep 9, 2015

jmschrei commented Sep 10, 2015

Choose a reason for hiding this comment

jmschrei commented Sep 10, 2015

ogrisel commented Sep 10, 2015

jmschrei commented Sep 10, 2015

arjoly commented Sep 10, 2015