2012/6/21 Emanuele Olivetti <emanu...@relativita.com>: > Dear All, > > I am interested in attempting model selection with GridSearchCV() on > GradientBoostingRegressor(). I am quite new to boosting but I see from > the nice examples in sklearn documentation [0] that once the n_estimator > is fixed, it is possible to evaluate the classifiers at each boosting > iteration through GradientBoostingRegressor.staged_decision_function() > and similar things (oob_score_, staged_predict). > > As in the figures of the examples the score on the test set, e.g. deviance, > has sometimes a minimum and it would be nice to get it during model selection > in order to score the given set of parameter values on it. How to do that > within > GridSearchCV?
Hi Emanuele, there is no straight-forward solution to this yet but it can be accomplished by overwritting/monkey-batching ``GradientBoostingRegressor.score``. Within ``score`` you call ``self.staged_predict`` and select the best boosting iteration and return the result of the evaluation metric you choose. Here is a quick example (not tested):: def custom_score(self, X, y_true): scores = [mse(y_true, y_pred) for y_pred in self.staged_predict(X)] best_iter = np.argmin(scores) best_score = scores[best_iter] # set model to best iteration self.n_estimator = best_iter + 1 self.estimators_ = self.estimators_[:self.n_estimators] return best_score GradientBoostingRegressor.score = custom_score The drawback of this approch is that you cannot use the ``score_func`` and ``loss_func`` arguments of ``GridSearchCV`` because if they are set ``GridSearchCV`` will use them instead of ``estimator.score`` I'm currently working on a PR with extends the functionality of the gradient boosting module including some convenience methods for finding the "optimal" number of estimators (=iterations). I'll keep you posted. best, Peter > > What I would like to do is to define sets of GradientBoosting parameter > values, e.g. > {'learn_rate':[0.05, 0.01, 0.001], 'subsample':[0.25, 0.5, 0.75], ...ecc.} > and then to do grid search to decide which set of values gives the minimum > score, e.g. mse, in the minimum of the related graph "score vs boosting > iteration". > Moreover it would be great to keep track of at which boosting iteration > this minimum occurs. > > I am reading the documentation but I cannot understand how to do that. Could > you help me? > > Best, > > Emanuele > > > [0]: > http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#example-ensemble-plot-gradient-boosting-regression-py > http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html#example-ensemble-plot-gradient-boosting-regularization-py > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Scikit-learn-general mailing list > Scikit-learn-general@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general -- Peter Prettenhofer ------------------------------------------------------------------------------ Live Security Virtual Conference Exclusive live event will cover all the ways today's security and threat landscape has changed and how IT managers can respond. Discussions will include endpoint security, mobile security and the latest in malware threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ _______________________________________________ Scikit-learn-general mailing list Scikit-learn-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general