Re: [Scikit-learn-general] GradientBoosting and GridSearchCV: how?

Emanuele Olivetti Thu, 21 Jun 2012 15:17:46 -0700

On 06/21/2012 06:04 PM, Peter Prettenhofer wrote:
> 2012/6/21 Emanuele Olivetti<emanu...@relativita.com>:
>> Dear All,
>>
>> I am interested in attempting model selection with GridSearchCV() on
>> GradientBoostingRegressor(). I am quite new to boosting but I see from
>> the nice examples in sklearn documentation [0] that once the n_estimator
>> is fixed, it is possible to evaluate the classifiers at each boosting
>> iteration through GradientBoostingRegressor.staged_decision_function()
>> and similar things (oob_score_, staged_predict).
>>
>> As in the figures of the examples the score on the test set, e.g. deviance,
>> has sometimes a minimum and it would be nice to get it during model selection
>> in order to score the given set of parameter values on it. How to do that 
>> within
>> GridSearchCV?
> Hi Emanuele,
>
> there is no straight-forward solution to this yet but it can be
> accomplished by overwritting/monkey-batching
> ``GradientBoostingRegressor.score``.
> Within ``score`` you call ``self.staged_predict`` and select the best
> boosting iteration and return the result of the evaluation metric you
> choose.
>
> Here is a quick example (not tested)::
>
> def custom_score(self, X, y_true):
>      scores = [mse(y_true, y_pred) for y_pred in self.staged_predict(X)]
>      best_iter = np.argmin(scores)
>      best_score = scores[best_iter]
>
>      # set model to best iteration
>      self.n_estimator = best_iter + 1
>      self.estimators_ = self.estimators_[:self.n_estimators]
>
>      return best_score
>
> GradientBoostingRegressor.score = custom_score
>
> The drawback of this approch is that you cannot use the ``score_func``
> and ``loss_func`` arguments of ``GridSearchCV`` because if they are
> set  ``GridSearchCV`` will use them instead of ``estimator.score``
>
> I'm currently working on a PR with extends the functionality of the
> gradient boosting module including some convenience methods for
> finding the "optimal" number of estimators (=iterations). I'll keep
> you posted.
>
>


Hi Peter,

Indeed a neat solution! Thanks a lot, it works very well :-)

Looking forward to your PR for improved model selection in
gradient boosting,

Emanuele


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Re: [Scikit-learn-general] GradientBoosting and GridSearchCV: how?

Reply via email to