GBRT API consistency #1088

pprett · 2012-08-30T18:09:44Z

This is a cerry-pick of #1036 that addresses #1085 and some other API issues of the GBRT module.

Summary:

Better docstrings
GradientBoostingClassifier now has staged_decision_function, staged_predict_proba, and staged_predict

These methods are important for efficient model selection for GBRT models.

…via Decorator pattern

…efactoring

pprett · 2012-08-30T18:13:01Z

sklearn/ensemble/gradient_boosting.py

+                             "call `fit` before `feature_importances_`.")
+        total_sum = np.zeros((self.n_features, ), dtype=np.float64)
+        for stage in self.estimators_:
+            stage_sum = sum(tree.compute_feature_importances(method='gini')


This is important - GBRT now uses 'gini' for feature importances - this was a bug in the current code that @glouppe pointed out a while ago but I thought it was ok - sorry about that - feature importances don't differ much though.

I am curious, what made you change your mind? :)

2012/9/3 Gilles Louppe notifications@github.com

In sklearn/ensemble/gradient_boosting.py:

for i in range(self.n_estimators): predict_stage(self.estimators_, i, X, self.learn_rate, score) yield score

@Property

def feature_importances_(self):

if self.estimators_ is None or len(self.estimators_) == 0:

raise ValueError("Estimator not fitted, " \

"call `fit` before `feature_importances_`.")

total_sum = np.zeros((self.n_features, ), dtype=np.float64)

for stage in self.estimators_:

stage_sum = sum(tree.compute_feature_importances(method='gini')

I am curious, what make you change your mind? :)

The fact that you were right and I was wrong :-) - I've misread/interpreted
the prose in ESL: "improvement in squared error risk over that for a
constant fit in the entire region" - the original publication is much more
specific about that.

I've noticed though, that feature rankings were remarkably stable despite
the weighting by n_left and n_right.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1088/files#r1516452.

Peter Prettenhofer

ogrisel · 2012-08-30T18:24:56Z

LGTM, +1 for API consistency. Can you document the changes in what's new?

pprett · 2012-08-30T18:54:08Z

sure - i'll finish that tomorrow
Am 30.08.2012 20:24 schrieb "Olivier Grisel" notifications@github.com:

LGTM, +1 for API consistency. Can you document the changes in what's new?

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1088#issuecomment-8169312.

ogrisel · 2012-09-03T11:59:33Z

@pprett thanks for the whatsnew fix. 👍 for merging by rebase on top of master.

pprett · 2012-09-04T06:04:30Z

thanks @ogrisel - I'll wait for another pair of eyes before I merge

glouppe · 2012-09-04T06:19:36Z

I am +1 for merge.

pprett · 2012-09-04T06:22:05Z

thx @glouppe - then I merge (by rebase)

glouppe · 2012-09-04T06:23:31Z

(By the way how do you merge by rebase? and what are the advantages with respect to usual merge? I never understood that :s)

pprett · 2012-09-04T06:33:21Z

the only advantage I can see is that rebase makes the history more consistent - when you work on a branch for quite a while and then merge to master chances are that the relevant commits are scattered all over the master history.

The drawback of rebasing is that your commits get new sha1 hashes which causes pain when other people work on your branch. According to http://git-scm.com/book/en/Git-Branching-Rebasing you should not rebase when you have pushed your branch to a public repo (e.g. a github PR) - I totally agree.

Besides from that I hardly know git rebase so take this with a grain of salt.

pprett · 2012-09-04T06:34:45Z

Merged by rebase

GaelVaroquaux · 2012-09-04T06:35:49Z

(By the way how do you merge by rebase?

Update master,
Pull and checkout the corresponding branch
git rebase master
git checkout master
git merge the_branch
git push

It's a bit tedious. If a get a lot of rebase conflicts, I switch to a merge, that can be easier.

and what are the advantages with
respect to usual merge? I never understood that :s)

It gives you a linear history in which each patch is applied after the other. It makes bisecting bugs much easier.

GaelVaroquaux · 2012-09-04T06:38:54Z

Merged by rebase

great. Thanks

glouppe · 2012-09-04T06:41:21Z

Didn't know that, thanks :)

On 4 September 2012 08:38, Gael Varoquaux notifications@github.com wrote:

Merged by rebase

great. Thanks

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1088#issuecomment-8253796.

pprett added 12 commits August 16, 2012 22:10

work on BaseGradientBoostingCV

8cba989

refactored prediction and decision_function (rm duplicate code)

a280e5d

ENH: use gini for feature importance

3abe97a

GradientBoosting classes with built in cross-validation; implemented …

7d204af

…via Decorator pattern

wip: aggregate fold via groupby

600fc63

Merge branch 'master' into gbrt-perf

7b01622

wip: fixing some set attr errors but still buggy if params not lists

1ec02b4

remove *CV classes - only pick decision_function and staged predict r…

5749550

…efactoring

Merge branch 'master' into gbrt-decision-function

52fd304

rm CV class tests

e002644

rm CV class legacy

2c404ed

remove CV class legacy

5895ace

pprett reviewed Aug 30, 2012
View reviewed changes

pprett added 2 commits August 31, 2012 11:29

Merge branch 'master' into gbrt-decision-function

f77885c

add API changes and feature_importance fix to whatsnew

a8ff326

pprett closed this Sep 4, 2012

pprett mentioned this pull request Sep 6, 2012

GradientBoostingClassifier doesn't work with least squares loss #1085

Closed

Uh oh!

GBRT API consistency #1088

GBRT API consistency #1088

Uh oh!

Conversation

pprett commented Aug 30, 2012

Uh oh!

pprett Aug 30, 2012

Choose a reason for hiding this comment

Uh oh!

glouppe Sep 3, 2012

Choose a reason for hiding this comment

Uh oh!

pprett Sep 3, 2012

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Aug 30, 2012

Uh oh!

pprett commented Aug 30, 2012

Uh oh!

ogrisel commented Sep 3, 2012

Uh oh!

pprett commented Sep 4, 2012

Uh oh!

glouppe commented Sep 4, 2012

Uh oh!

pprett commented Sep 4, 2012

Uh oh!

glouppe commented Sep 4, 2012

Uh oh!

pprett commented Sep 4, 2012

Uh oh!

pprett commented Sep 4, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

GaelVaroquaux commented Sep 4, 2012

Uh oh!

glouppe commented Sep 4, 2012

Uh oh!

Uh oh!