WIP GBRT with built-in cross-validation #1036

pprett · 2012-08-18T10:41:50Z

Two new classes GradientBoostingClassifierCV and GradientBoostingRegressorCV which pick n_estimators based on cross-validation.

GradientBoostingClassifierCV fits a GradientBoostingClassifier with max_estimators for each fold; it picks n_estimators based on the min deviance averaged over all test sets. Finally, it trains the model on the whole training set using the found n_estimators.

GradientBoostingClassifierCV is implemented as a GradientBoostingClassifier decorator. It soley implements fit, otherwise it delegates to GradientBoostingClassifier (see __getattr__ and __setattr__).
The current implementation might pose some problems if the client uses isinstance rather than duck typing: a GradientBoostingClassifierCV instance is not an instance of GradientBoostingClassifier. I would really appreciate any remarks/feedback to this issue.

I tried to adhere the interface of RidgeCV.

Additionally, I refactored the prediction routines in order to remove code duplication. staged_predict and staged_predict_proba has been added to GradientBoostingClassifier.

Limitations:

It is currently hard-wired to pick n_estimtors based on deviance (no support for custom loss function yet) - is this needed?
No joblib support yet
Only cross-validation support - should we add held-out and OOB estimation too?

…via Decorator pattern

ogrisel · 2012-08-18T13:14:27Z

sklearn/ensemble/gradient_boosting.py

+                                  kwargs.pop('max_estimators', 1000))
+
+        kwargs['n_estimators'] = self.max_estimators
+        BaseEstimator.__setattr__(self, '_model', self._model_class(**kwargs))


I wonder if the sub model instantiation should not be deferred to the fit call and the sub-estimator storing attribute be renamed to self.model_. That will require to store the model init kwargs as an attribute though.

Actually that would break the __setattr__ and __getattr__ override so please ignore my previous comment.

ogrisel · 2012-08-18T13:37:21Z

Nice, this sounds like a nice feature for kaggle competitors :)

About the isinstance issue, I don't think this is an issue. AFAIK there is no assumption that a ModelCV class should be a subclass of the vanilla Model class.
Custom loss for model selection would be nice but can come in a later PR IMO
Leveraging OOB samples for model selection and / or to return test-set accuracy estimate would be a boon too IMHO but can also probably be done in a later PR.
Could you add some smoke tests for pickling and use in a pipeline / wrapping GridSearchCV (e.g. on the learning rate) so as to check that the base class g/setattr override does not break those meta tools? Maybe this can be done as part of the common testing framework by @amueller (I am not yet familiar with it).

amueller · 2012-08-18T20:49:12Z

@ogrisel I am not entirely sure what you want to check. Just that GridSearchCV works?
We could do that in the common test framework very easily for all classifiers. I am not sure if it is actually true, though. Well for multi-class it should be fine, binary might not work yet, see the inconsistent shape discussion.

ogrisel · 2012-08-18T23:58:20Z

indeed, maye a custom test for GBRT*CV models then. The params for grid is model specific anyway.

mblondel · 2012-08-19T08:28:33Z

It may be nice to support tuning other parameters than n_estimators in GradientBoostingClassifierCV directly. Using GridSearchCV on top of GradientBoostingClassifierCV is possible but it would be semantically a bit different. It would result in a greedy approximation: tune n_estimators with the other parameters fixed first then tune the remaining of the parameters. On the other hand, if tuning other parameters were supported directly in GradientBoostingClassifierCV, it would be an exhaustive search. In term of implementation, the idea would be to generate parameter combinations with IterGrid but to handle n_estimators specifically for efficiency.

pprett · 2012-08-20T13:53:42Z

thanks for the feedback!

@mblondel I'm not a friend of supporting tuning other parameters; n_estimators has strong interactions with other tuning parameters (esp. learn_rate). E.g. changing learn_rate=0.001 to learn_rate=0.0001 likely requires 10x more boosting iterations for the same training deviance. In fact, I'd actually use GridSearchCV on top of GradientBoostingClassifierCV if I have sufficient computational resources.

If computational resources are an issue, I'd use GridSearchCV for paramter tuning choosing n_estimators as large as possible (depending on computational resources). Then, I'd use GradientBoostingClassifierCV to determine n_estimators.

ogrisel · 2012-08-20T14:18:34Z

@pprett Would be great to wrap up this in a "Parameters selection tips" section in the narrative doc of the GBRT models.

pprett · 2012-08-20T14:25:48Z

I agree but I would certainly end up copying Greg Ridgeways definitive guide to parameter selection in GBM (it is linked in the docs but it deserves more promotion) http://cran.r-project.org/web/packages/gbm/gbm.pdf .

mblondel · 2012-08-20T14:25:56Z

@mblondel I'm not a friend of supporting tuning other parameters; n_estimators has strong interactions with other tuning parameters (esp. learn_rate). E.g. changing learn_rate=0.001 to learn_rate=0.0001 likely requires 10x more boosting iterations for the same training deviance. In fact, I'd actually use GridSearchCV on top of GradientBoostingClassifierCV if I have sufficient computational resources.

So, to summarize, you would use a sane default learning_rate and optimize n_estimators only in most cases?

If computational resources are an issue, I'd use GridSearchCV for paramter tuning choosing n_estimators as large as possible (depending on computational resources). Then, I'd use GradientBoostingClassifierCV to determine n_estimators.

I don't understand that. For me GradientBoostingClassifierCV should always be more efficient to choose n_estimators (the process is incremental and the prediction scores are readily available)

pprett · 2012-08-20T14:47:08Z

If computational resources are an issue, I'd use GridSearchCV for paramter tuning choosing n_estimators as large as possible (depending on computational resources). Then, I'd use GradientBoostingClassifierCV to determine n_estimators.

I don't understand that. For me GradientBoostingClassifierCV should always be more efficient to choose n_estimators (the process is incremental and the prediction scores are readily available)

Sorry, I meant I'd use GridSearchCV for tuning max_depth, min_samples_split and learn_rate using a fixed n_estimators (as large as possible, e.g. 3000). Then, I'd tune n_estimators via GradientBoostingClassifierCV fixing the other parameters with the values found by GridSearchCV. Does this makes sense now?

mblondel · 2012-08-20T14:50:10Z

@pprett yes :)

pprett · 2012-08-22T07:35:38Z

@mblondel please ignore my first response to your comment - I was thinking about this issue yesterday and it does make sense to wrap GridSearchCV within GradientBoostingClassifierCV** - the way I described it here would be rather wasteful in terms of computational resources (2x CV)... Apart from that I'm not totally sure about the difference between the two approaches... I need to spend more time on this issue. Anyway, sorry for the noise.

** also described in this thread on the ML http://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/msg03395.html

PS: I promise next time I'll think before I write

mblondel · 2012-08-22T08:18:26Z

No worries.

To tune other parameters than n_estimators directly in GradientBoostingClassifierCV, the idea that I had in mind was to use IterGrid to generate all parameter combinations (n_estimators excepted). For example, given {'learn_rate':[0.05, 0.01, 0.001], 'subsample':[0.25, 0.5, 0.75]}, one parameter combination will be {'learn_rate'=0.05, 'subsample'=0.25}. Given these two values fixed, it is possible to choose n_estimators efficiently. Then this process must be repeated for each fold (to compute the average score of a parameter combination).

One thing that worries me about using GridSearchCV on top of GradientBoostingClassifierCV is that the train / validation split will have to be made twice (once inside GridSearchCV, and once inside GradientBoostingClassifierCV). Not good if you don't have so much data.

Supporting other parameters in GradientBoostingClassifierCV will however increase the implementation complexity...

pprett · 2012-08-22T08:39:37Z

2012/8/22 Mathieu Blondel notifications@github.com

No worries.

To tune other parameters than n_estimators directly in
GradientBoostingClassifierCV, the idea that I had in mind was to use
IterGrid to generate all parameter combinations (n_estimators excepted).
For example, given {'learn_rate':[0.05, 0.01, 0.001], 'subsample':[0.25,
0.5, 0.75]}, one parameter combination will be {'learn_rate'=0.05,
'subsample'=0.25}. Given these two values fixed, it is possible to choose
n_estimators efficiently. Then this process must be repeated for each
fold (to compute the average score of a parameter combination).

Exactly, for each grid point and fold you get an array of deviance scores
(shape=n_estimators); for each grid point you then need to compute the mean
deviance scores across all folds and pick n_estimators with the lowest
(mean) deviance.

One thing that worries me about using GridSearchCV on top of
GradientBoostingClassifierCV is that the train / validation split will
have to be made twice (once inside GridSearchCV, and once inside
GradientBoostingClassifierCV). Not good if you don't have so much data.

I agree

Supporting other parameters in GradientBoostingClassifierCV will however
increase the implementation complexity...

IterGrid and KFold do most of the heavy lifting - I simply pass the
results of each grid point - fold combination into itertools.groupby in
order to groupby grid point id and than compute mean deviance for each grid
point. I'll push an update in the evening.

thanks!

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/1036#issuecomment-7927499.

Peter Prettenhofer

GaelVaroquaux · 2012-08-22T13:35:42Z

On Wed, Aug 22, 2012 at 01:18:28AM -0700, Mathieu Blondel wrote:

One thing that worries me about using GridSearchCV on top of
GradientBoostingClassifierCV is that the train / validation split will
have to be made twice (once inside GridSearchCV, and once inside
GradientBoostingClassifierCV). Not good if you don't have so much data.

That's one reason why we need to be able to have cross-validation like
objects use a validation set. There is (quite a bit) of design work to do
here...

G

amueller · 2012-08-22T14:23:05Z

Did someone say api design?

If we where able to pass the CV-like object the test-split of the grid-search, we'd be fine, right?
So we add another function fit_with_validation(X_train, y, X_test) and let the GridSearchCV check if the object has that,
the splitting-twice problem would be averted.

raghavrv · 2015-12-02T14:25:59Z

Revived by @vighneshbirodkar in #5689

amueller · 2018-09-27T01:40:58Z

Fixed in #7071

pprett added 4 commits August 16, 2012 22:10

work on BaseGradientBoostingCV

8cba989

refactored prediction and decision_function (rm duplicate code)

a280e5d

ENH: use gini for feature importance

3abe97a

GradientBoosting classes with built in cross-validation; implemented …

7d204af

…via Decorator pattern

ogrisel reviewed Aug 18, 2012
View reviewed changes

pprett added 3 commits August 27, 2012 10:26

wip: aggregate fold via groupby

600fc63

Merge branch 'master' into gbrt-perf

7b01622

wip: fixing some set attr errors but still buggy if params not lists

1ec02b4

This was referenced Aug 30, 2012

GradientBoostingClassifier doesn't work with least squares loss #1085

Closed

GBRT API consistency #1088

Closed

larsmans force-pushed the master branch from 58a55ad to 4b82379 Compare August 25, 2014 21:50

MechCoder force-pushed the master branch from 6deaea0 to 3f49cee Compare November 3, 2014 12:36

amueller mentioned this pull request Jul 31, 2015

[WIP] _tree.pyx rewrite #5041

Closed

30 tasks

vighneshbirodkar mentioned this pull request Nov 2, 2015

[MRG] Gradient Boosting Classifier CV #5689

Closed

raghavrv mentioned this pull request Jul 24, 2016

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

Merged

amueller closed this Sep 27, 2018

Uh oh!

WIP GBRT with built-in cross-validation #1036

WIP GBRT with built-in cross-validation #1036

Uh oh!

Conversation

pprett commented Aug 18, 2012

Uh oh!

ogrisel Aug 18, 2012

Choose a reason for hiding this comment

Uh oh!

ogrisel Aug 18, 2012

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Aug 18, 2012

Uh oh!

amueller commented Aug 18, 2012

Uh oh!

ogrisel commented Aug 18, 2012

Uh oh!

mblondel commented Aug 19, 2012

Uh oh!

pprett commented Aug 20, 2012

Uh oh!

ogrisel commented Aug 20, 2012

Uh oh!

pprett commented Aug 20, 2012

Uh oh!

mblondel commented Aug 20, 2012

Uh oh!

pprett commented Aug 20, 2012

Uh oh!

mblondel commented Aug 20, 2012

Uh oh!

pprett commented Aug 22, 2012

Uh oh!

mblondel commented Aug 22, 2012

Uh oh!

pprett commented Aug 22, 2012

Uh oh!

GaelVaroquaux commented Aug 22, 2012

Uh oh!

amueller commented Aug 22, 2012

Uh oh!

raghavrv commented Dec 2, 2015

Uh oh!

amueller commented Sep 27, 2018

Uh oh!

Uh oh!