-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
GBRT API consistency #1088
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GBRT API consistency #1088
Conversation
…via Decorator pattern
"call `fit` before `feature_importances_`.") | ||
total_sum = np.zeros((self.n_features, ), dtype=np.float64) | ||
for stage in self.estimators_: | ||
stage_sum = sum(tree.compute_feature_importances(method='gini') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important - GBRT now uses 'gini'
for feature importances - this was a bug in the current code that @glouppe pointed out a while ago but I thought it was ok - sorry about that - feature importances don't differ much though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am curious, what made you change your mind? :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2012/9/3 Gilles Louppe notifications@github.com
In sklearn/ensemble/gradient_boosting.py:
for i in range(self.n_estimators): predict_stage(self.estimators_, i, X, self.learn_rate, score) yield score
- @Property
- def feature_importances_(self):
if self.estimators_ is None or len(self.estimators_) == 0:
raise ValueError("Estimator not fitted, " \
"call `fit` before `feature_importances_`.")
total_sum = np.zeros((self.n_features, ), dtype=np.float64)
for stage in self.estimators_:
stage_sum = sum(tree.compute_feature_importances(method='gini')
I am curious, what make you change your mind? :)
The fact that you were right and I was wrong :-) - I've misread/interpreted
the prose in ESL: "improvement in squared error risk over that for a
constant fit in the entire region" - the original publication is much more
specific about that.
I've noticed though, that feature rankings were remarkably stable despite
the weighting by n_left and n_right.
—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1088/files#r1516452.
Peter Prettenhofer
LGTM, +1 for API consistency. Can you document the changes in what's new? |
sure - i'll finish that tomorrow
|
@pprett thanks for the whatsnew fix. 👍 for merging by rebase on top of master. |
thanks @ogrisel - I'll wait for another pair of eyes before I merge |
I am +1 for merge. |
thx @glouppe - then I merge (by rebase) |
(By the way how do you merge by rebase? and what are the advantages with respect to usual merge? I never understood that :s) |
the only advantage I can see is that rebase makes the history more consistent - when you work on a branch for quite a while and then merge to master chances are that the relevant commits are scattered all over the master history. The drawback of rebasing is that your commits get new sha1 hashes which causes pain when other people work on your branch. According to http://git-scm.com/book/en/Git-Branching-Rebasing you should not rebase when you have pushed your branch to a public repo (e.g. a github PR) - I totally agree. Besides from that I hardly know git rebase so take this with a grain of salt. |
Merged by rebase |
It's a bit tedious. If a get a lot of rebase conflicts, I switch to a merge, that can be easier.
It gives you a linear history in which each patch is applied after the other. It makes bisecting bugs much easier. |
great. Thanks |
Didn't know that, thanks :) On 4 September 2012 08:38, Gael Varoquaux notifications@github.com wrote:
|
This is a cerry-pick of #1036 that addresses #1085 and some other API issues of the GBRT module.
Summary:
GradientBoostingClassifier
now hasstaged_decision_function
,staged_predict_proba
, andstaged_predict
These methods are important for efficient model selection for GBRT models.