2012/6/21 Emanuele Olivetti <emanu...@relativita.com>:
> Dear All,
>
> I am interested in attempting model selection with GridSearchCV() on
> GradientBoostingRegressor(). I am quite new to boosting but I see from
> the nice examples in sklearn documentation [0] that once the n_estimator
> is fixed, it is possible to evaluate the classifiers at each boosting
> iteration through GradientBoostingRegressor.staged_decision_function()
> and similar things (oob_score_, staged_predict).
>
> As in the figures of the examples the score on the test set, e.g. deviance,
> has sometimes a minimum and it would be nice to get it during model selection
> in order to score the given set of parameter values on it. How to do that 
> within
> GridSearchCV?

Hi Emanuele,

there is no straight-forward solution to this yet but it can be
accomplished by overwritting/monkey-batching
``GradientBoostingRegressor.score``.
Within ``score`` you call ``self.staged_predict`` and select the best
boosting iteration and return the result of the evaluation metric you
choose.

Here is a quick example (not tested)::

def custom_score(self, X, y_true):
    scores = [mse(y_true, y_pred) for y_pred in self.staged_predict(X)]
    best_iter = np.argmin(scores)
    best_score = scores[best_iter]

    # set model to best iteration
    self.n_estimator = best_iter + 1
    self.estimators_ = self.estimators_[:self.n_estimators]

    return best_score

GradientBoostingRegressor.score = custom_score

The drawback of this approch is that you cannot use the ``score_func``
and ``loss_func`` arguments of ``GridSearchCV`` because if they are
set  ``GridSearchCV`` will use them instead of ``estimator.score``

I'm currently working on a PR with extends the functionality of the
gradient boosting module including some convenience methods for
finding the "optimal" number of estimators (=iterations). I'll keep
you posted.

best,
 Peter

>
> What I would like to do is to define sets of GradientBoosting parameter
> values, e.g.
> {'learn_rate':[0.05, 0.01, 0.001], 'subsample':[0.25, 0.5, 0.75], ...ecc.}
> and then to do grid search to decide which set of values gives the minimum
> score, e.g. mse, in the minimum of the related graph "score vs boosting 
> iteration".
> Moreover it would be great to keep track of at which boosting iteration
> this minimum occurs.
>
> I am reading the documentation but I cannot understand how to do that. Could
> you help me?
>
> Best,
>
> Emanuele
>
>
> [0]:
> http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#example-ensemble-plot-gradient-boosting-regression-py
> http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html#example-ensemble-plot-gradient-boosting-regularization-py
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-general@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



-- 
Peter Prettenhofer

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to