On 06/21/2012 06:04 PM, Peter Prettenhofer wrote: > 2012/6/21 Emanuele Olivetti<emanu...@relativita.com>: >> Dear All, >> >> I am interested in attempting model selection with GridSearchCV() on >> GradientBoostingRegressor(). I am quite new to boosting but I see from >> the nice examples in sklearn documentation [0] that once the n_estimator >> is fixed, it is possible to evaluate the classifiers at each boosting >> iteration through GradientBoostingRegressor.staged_decision_function() >> and similar things (oob_score_, staged_predict). >> >> As in the figures of the examples the score on the test set, e.g. deviance, >> has sometimes a minimum and it would be nice to get it during model selection >> in order to score the given set of parameter values on it. How to do that >> within >> GridSearchCV? > Hi Emanuele, > > there is no straight-forward solution to this yet but it can be > accomplished by overwritting/monkey-batching > ``GradientBoostingRegressor.score``. > Within ``score`` you call ``self.staged_predict`` and select the best > boosting iteration and return the result of the evaluation metric you > choose. > > Here is a quick example (not tested):: > > def custom_score(self, X, y_true): > scores = [mse(y_true, y_pred) for y_pred in self.staged_predict(X)] > best_iter = np.argmin(scores) > best_score = scores[best_iter] > > # set model to best iteration > self.n_estimator = best_iter + 1 > self.estimators_ = self.estimators_[:self.n_estimators] > > return best_score > > GradientBoostingRegressor.score = custom_score > > The drawback of this approch is that you cannot use the ``score_func`` > and ``loss_func`` arguments of ``GridSearchCV`` because if they are > set ``GridSearchCV`` will use them instead of ``estimator.score`` > > I'm currently working on a PR with extends the functionality of the > gradient boosting module including some convenience methods for > finding the "optimal" number of estimators (=iterations). I'll keep > you posted. > >
Hi Peter, Indeed a neat solution! Thanks a lot, it works very well :-) Looking forward to your PR for improved model selection in gradient boosting, Emanuele ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general