Hi Emanuele, hi Peter.
@Emanuele: You could also try to use IterGrid instead of GridSearchCV.
This might mean having to do some things by hand, but should work.

@Peter: Could your improvements also be applied to RandomForests
an the oob score? Having a method there would also be quite nice.

Cheers,
Andy

Am 21.06.2012 18:04, schrieb Peter Prettenhofer:
> 2012/6/21 Emanuele Olivetti<emanu...@relativita.com>:
>> Dear All,
>>
>> I am interested in attempting model selection with GridSearchCV() on
>> GradientBoostingRegressor(). I am quite new to boosting but I see from
>> the nice examples in sklearn documentation [0] that once the n_estimator
>> is fixed, it is possible to evaluate the classifiers at each boosting
>> iteration through GradientBoostingRegressor.staged_decision_function()
>> and similar things (oob_score_, staged_predict).
>>
>> As in the figures of the examples the score on the test set, e.g. deviance,
>> has sometimes a minimum and it would be nice to get it during model selection
>> in order to score the given set of parameter values on it. How to do that 
>> within
>> GridSearchCV?
> Hi Emanuele,
>
> there is no straight-forward solution to this yet but it can be
> accomplished by overwritting/monkey-batching
> ``GradientBoostingRegressor.score``.
> Within ``score`` you call ``self.staged_predict`` and select the best
> boosting iteration and return the result of the evaluation metric you
> choose.
>
> Here is a quick example (not tested)::
>
> def custom_score(self, X, y_true):
>      scores = [mse(y_true, y_pred) for y_pred in self.staged_predict(X)]
>      best_iter = np.argmin(scores)
>      best_score = scores[best_iter]
>
>      # set model to best iteration
>      self.n_estimator = best_iter + 1
>      self.estimators_ = self.estimators_[:self.n_estimators]
>
>      return best_score
>
> GradientBoostingRegressor.score = custom_score
>
> The drawback of this approch is that you cannot use the ``score_func``
> and ``loss_func`` arguments of ``GridSearchCV`` because if they are
> set  ``GridSearchCV`` will use them instead of ``estimator.score``
>
> I'm currently working on a PR with extends the functionality of the
> gradient boosting module including some convenience methods for
> finding the "optimal" number of estimators (=iterations). I'll keep
> you posted.
>
> best,
>   Peter
>
>> What I would like to do is to define sets of GradientBoosting parameter
>> values, e.g.
>> {'learn_rate':[0.05, 0.01, 0.001], 'subsample':[0.25, 0.5, 0.75], ...ecc.}
>> and then to do grid search to decide which set of values gives the minimum
>> score, e.g. mse, in the minimum of the related graph "score vs boosting 
>> iteration".
>> Moreover it would be great to keep track of at which boosting iteration
>> this minimum occurs.
>>
>> I am reading the documentation but I cannot understand how to do that. Could
>> you help me?
>>
>> Best,
>>
>> Emanuele
>>
>>
>> [0]:
>> http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regression.html#example-ensemble-plot-gradient-boosting-regression-py
>> http://scikit-learn.org/stable/auto_examples/ensemble/plot_gradient_boosting_regularization.html#example-ensemble-plot-gradient-boosting-regularization-py
>>
>>
>> ------------------------------------------------------------------------------
>> Live Security Virtual Conference
>> Exclusive live event will cover all the ways today's security and
>> threat landscape has changed and how IT managers can respond. Discussions
>> will include endpoint security, mobile security and the latest in malware
>> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-general@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-general@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to