-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
WIP GBRT with built-in cross-validation #1036
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
kwargs.pop('max_estimators', 1000)) | ||
|
||
kwargs['n_estimators'] = self.max_estimators | ||
BaseEstimator.__setattr__(self, '_model', self._model_class(**kwargs)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the sub model instantiation should not be deferred to the fit
call and the sub-estimator storing attribute be renamed to self.model_
. That will require to store the model init kwargs
as an attribute though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually that would break the __setattr__
and __getattr__
override so please ignore my previous comment.
Nice, this sounds like a nice feature for kaggle competitors :)
|
@ogrisel I am not entirely sure what you want to check. Just that GridSearchCV works? |
indeed, maye a custom test for GBRT*CV models then. The params for grid is model specific anyway. |
It may be nice to support tuning other parameters than |
thanks for the feedback! @mblondel I'm not a friend of supporting tuning other parameters; If computational resources are an issue, I'd use |
@pprett Would be great to wrap up this in a "Parameters selection tips" section in the narrative doc of the GBRT models. |
I agree but I would certainly end up copying Greg Ridgeways definitive guide to parameter selection in GBM (it is linked in the docs but it deserves more promotion) http://cran.r-project.org/web/packages/gbm/gbm.pdf . |
So, to summarize, you would use a sane default learning_rate and optimize n_estimators only in most cases?
I don't understand that. For me |
Sorry, I meant I'd use |
@pprett yes :) |
@mblondel please ignore my first response to your comment - I was thinking about this issue yesterday and it does make sense to wrap ** also described in this thread on the ML http://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/msg03395.html PS: I promise next time I'll think before I write |
No worries. To tune other parameters than One thing that worries me about using Supporting other parameters in |
2012/8/22 Mathieu Blondel notifications@github.com
thanks!
Peter Prettenhofer |
On Wed, Aug 22, 2012 at 01:18:28AM -0700, Mathieu Blondel wrote:
That's one reason why we need to be able to have cross-validation like G |
Did someone say api design? If we where able to pass the CV-like object the test-split of the grid-search, we'd be fine, right? |
Revived by @vighneshbirodkar in #5689 |
Fixed in #7071 |
Two new classes
GradientBoostingClassifierCV
andGradientBoostingRegressorCV
which pickn_estimators
based on cross-validation.GradientBoostingClassifierCV
fits aGradientBoostingClassifier
withmax_estimators
for each fold; it picksn_estimators
based on the min deviance averaged over all test sets. Finally, it trains the model on the whole training set using the foundn_estimators
.GradientBoostingClassifierCV
is implemented as aGradientBoostingClassifier
decorator. It soley implementsfit
, otherwise it delegates toGradientBoostingClassifier
(see__getattr__
and__setattr__
).The current implementation might pose some problems if the client uses
isinstance
rather than duck typing: aGradientBoostingClassifierCV
instance is not an instance ofGradientBoostingClassifier
. I would really appreciate any remarks/feedback to this issue.I tried to adhere the interface of
RidgeCV
.Additionally, I refactored the prediction routines in order to remove code duplication.
staged_predict
andstaged_predict_proba
has been added toGradientBoostingClassifier
.Limitations:
n_estimtors
based on deviance (no support for custom loss function yet) - is this needed?