[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

raghavrv · 2016-07-24T15:22:09Z

Finishing up @vighneshbirodkar's #5689 (Also refer #1036)

Enables early stopping to gradient boosted models via new parameters n_iter_no_change, validation_fraction, tol.

(This takes inspiration from our MLPClassifier)

This has been rewritten after IRL discussions with @agramfort and @ogrisel.

@amueller @agramfort @MechCoder @vighneshbirodkar @ogrisel @glouppe @pprett

raghavrv · 2016-07-24T15:27:48Z

sklearn/ensemble/gradient_boosting_cv.py

+            mean = np.mean(scores)
+            params['n_estimators'] = int(np.mean(n_est_list))
+            score_tuple = _CVScoreTuple(params, mean, scores)
+            grid_scores.append(_CVScoreTuple(params, mean, scores))


@amueller @jnothman Should we use GridSearchCV like results_ here?

Maybe. We should not use _CVScoreTuple, we should try to get rid of that one. The other EstimatorCV models implement scores_, right? I feel like they should all have a consistent interface.

A consistent interface that is like GridSearchCV's results_?

Yes, I'd like to see a consistent interface like GridSearchCV's results, and always had that in mind. I don't suppose we'll see it in 0.18. Sorry I wasn't fastidious enough to post this as a new issue months ago.

raghavrv · 2016-07-25T14:28:51Z

Also any advice on what kind of tests I should add here?

raghavrv · 2016-07-25T14:37:50Z

And finally would like to know

If we need the monitor callable at fit of GradientBoosting*CV too? Would that be redundant with the stop_rounds param?
Should we port the stop_rounds to GradientBoostingClassifier/Regressor too?

amueller · 2016-07-26T15:06:18Z

tests: that this does the same as naive grid-search and that it stops early when it should.

amueller · 2016-07-26T15:06:51Z

doc/modules/classes.rst

@@ -383,6 +383,7 @@ Samples generator
   ensemble.ExtraTreesRegressor
   ensemble.GradientBoostingClassifier
   ensemble.GradientBoostingRegressor
+   ensemble.GradientBoostingClassifierCV


RegressorCV also

Added below

amueller · 2016-07-26T15:16:25Z

n_iter_combo is not documented. I'd really not implement randomized search here and wait until we have the right API with warm starts to do that with arbitrary search methods.

amueller · 2016-07-26T15:17:27Z

sklearn/ensemble/gradient_boosting_cv.py

+                       for train, validation in cv.split(X)
+                       for params in param_iter)
+
+        n_splits = int(len(out)/len(param_iter))


pep8, also //

raghavrv · 2016-07-26T15:22:00Z

@amueller could you enter captcha before reviewing? :p

amueller · 2016-07-26T15:22:36Z

I remember distinctly that you asked for a review yesterday ;)

raghavrv · 2016-07-26T15:23:26Z

Thanks a tonne for looking into the PR. I'll address them and update you!

amueller · 2016-07-26T15:23:43Z

sklearn/ensemble/gradient_boosting_cv.py

+        gb.n_estimators = i
+        gb.fit(X_train, y_train)
+
+        scores = np.roll(scores, shift=-1)


this seems an odd way of doing this. Why not append to a list?

We need a way to retain the last 10 scores. Hence a shift register kinda setup.

jnothman · 2017-07-26T13:40:48Z

doc/whats_new.rst

+     via ``n_iter_no_change``, ``validation_fraction`` and `tol`. :issue:`7071`
+     by `Raghav RV`_
+
+   - Added :class:`multioutput.ClassifierChain` for multi-label


merge error

No this is in master. I don't why it shows in the diff.

amueller · 2017-07-26T18:22:25Z

doc/whats_new.rst

@@ -81,6 +81,13 @@ New features

 Classifiers and regressors

+   - :class:`ensemble.GradientBoostingClassifier` and
+     :class:`ensemble.GradientBoostingRegressor` now support early stopping
+     via ``n_iter_no_change``, ``validation_fraction`` and `tol`. :issue:`7071`


why did tol only get single backticks? ;)

jnothman · 2017-07-27T21:46:21Z

given that there are a couple of PRs about related interfaces open, I think we have extra reason to not release this in 0.19

…

On 28 Jul 2017 12:44 am, "(Venkat) Raghav, Rajagopalan" < ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In doc/whats_new.rst <#7071 (comment)> : > @@ -81,6 +81,13 @@ New features Classifiers and regressors + - :class:`ensemble.GradientBoostingClassifier` and + :class:`ensemble.GradientBoostingRegressor` now support early stopping + via ``n_iter_no_change``, ``validation_fraction`` and `tol`. :issue:`7071` + by `Raghav RV`_ + + - Added :class:`multioutput.ClassifierChain` for multi-label No this is in master. I don't why it shows in the diff. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#7071 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz69PvXrc0Z96yS97dam-7KmKrQEZaks5sSKJCgaJpZM4JTmPM> .

raghavrv · 2017-08-09T13:55:56Z

@jnothman Have moved the whatsnew entry into a new section - 0.20

ogrisel · 2017-08-09T15:27:57Z

examples/ensemble/plot_gradient_boosting_early_stopping.py

+This example illustrates how the early stopping can used in the
+:class:`sklearn.ensemble.GradientBoostingClassifier` model to achieve
+almost the same accuracy as compared to a model built without early stopping
+using many fewer estimators. This can save memory and prediction time.


This can significantly reduce training time, memory usage and prediction latency.

ogrisel · 2017-08-09T15:30:52Z

sklearn/ensemble/gradient_boosting.py

+        early stopping. Must be between 0 and 1.
+        Only used if ``n_iter_no_change`` is set to an integer.
+
+        .. versionadded:: 0.19


This should be changed to 0.20.

ogrisel · 2017-08-09T15:31:48Z

sklearn/ensemble/gradient_boosting.py

+        early stopping. Must be between 0 and 1.
+        Only used if early_stopping is True
+
+        .. versionadded:: 0.19


same comment for this class docstring: 0.20.

ogrisel

LGTM besides small comments.

ogrisel · 2017-08-09T15:38:51Z

sklearn/ensemble/tests/test_gradient_boosting.py

+def test_gradient_boosting_validation_fraction():
+    X, y = make_classification(n_samples=1000, random_state=0)
+
+    gbc = GradientBoostingClassifier(n_estimators=100,


This comment by @jnothman has not been addressed.

ogrisel · 2017-08-09T15:39:34Z

sklearn/ensemble/tests/test_gradient_boosting.py

+def test_gradient_boosting_validation_fraction():
+    X, y = make_classification(n_samples=1000, random_state=0)
+
+    gbc = GradientBoostingClassifier(n_estimators=100,


We still use nosetest in travis and appveyor at this time though.

raghavrv · 2017-08-09T18:33:30Z

ogrisel · 2017-08-09T20:03:15Z

Merged. Thanks for this nice contribution @raghavrv!

…-learn#7071)

raghavrv reviewed Jul 24, 2016
View reviewed changes

raghavrv force-pushed the gbcv branch 3 times, most recently from ba42a94 to 5d742df Compare July 26, 2016 14:08

amueller reviewed Jul 26, 2016
View reviewed changes

raghavrv force-pushed the gbcv branch from ad97496 to e3de4e5 Compare July 26, 2017 13:23

raghavrv added 2 commits July 26, 2017 15:26

Merge branch 'master' into gbcv

434b5a4

Fix indentation

baf2eb7

jnothman reviewed Jul 26, 2017

View reviewed changes

amueller reviewed Jul 26, 2017

View reviewed changes

jnothman mentioned this pull request Jul 27, 2017

[MRG+1] MLPRegressor quits fitting too soon due to self._no_improvement_count #9457

Merged

raghavrv added 3 commits July 27, 2017 16:40

fix indentation

a68fcf6

Merge branch 'master' into gbcv

95695b8

fix single backtick

8ef7b32

raghavrv force-pushed the gbcv branch from e3de4e5 to 8ef7b32 Compare July 27, 2017 14:44

raghavrv added 2 commits August 9, 2017 15:51

Merge branch 'master' into gbcv

0e99698

Move the new entry under 0.20

0aa0a1a

ogrisel reviewed Aug 9, 2017

View reviewed changes

ogrisel approved these changes Aug 9, 2017

View reviewed changes

Update version information

c0dc1d8

ogrisel merged commit 312b1df into scikit-learn:master Aug 9, 2017

raghavrv deleted the gbcv branch August 9, 2017 20:15

paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017

ENH Early stopping for Gradient Boosting Classifier/Regressor (scikit…

8cd586f

…-learn#7071)

AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

ENH Early stopping for Gradient Boosting Classifier/Regressor (scikit…

e8e5785

…-learn#7071)

TomDLT mentioned this pull request Oct 25, 2017

Using warmstart in GradientBoostingClassifier produces inferior solution #10000

Closed

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

ENH Early stopping for Gradient Boosting Classifier/Regressor (scikit…

964ae4f

…-learn#7071)

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

ENH Early stopping for Gradient Boosting Classifier/Regressor (scikit…

58fd4b7

…-learn#7071)

This was referenced Sep 27, 2018

WIP GBRT with built-in cross-validation #1036

Closed

[MRG] Gradient Boosting Classifier CV #5689

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

raghavrv commented Jul 24, 2016 •

edited

Loading

raghavrv Jul 24, 2016

amueller Jul 26, 2016

raghavrv Aug 16, 2016

jnothman Aug 22, 2016

raghavrv commented Jul 25, 2016

raghavrv commented Jul 25, 2016

amueller commented Jul 26, 2016

amueller Jul 26, 2016

raghavrv Aug 5, 2016

amueller commented Jul 26, 2016 •

edited

Loading

amueller Jul 26, 2016

raghavrv commented Jul 26, 2016

amueller commented Jul 26, 2016

raghavrv commented Jul 26, 2016

amueller Jul 26, 2016

raghavrv Jul 26, 2016

jnothman Jul 26, 2017

raghavrv Jul 27, 2017

amueller Jul 26, 2017

jnothman commented Jul 27, 2017 via email

raghavrv commented Aug 9, 2017

ogrisel Aug 9, 2017

ogrisel Aug 9, 2017

ogrisel Aug 9, 2017

ogrisel left a comment

ogrisel Aug 9, 2017

ogrisel Aug 9, 2017

raghavrv commented Aug 9, 2017

ogrisel commented Aug 9, 2017

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

Conversation

raghavrv commented Jul 24, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv commented Jul 25, 2016

raghavrv commented Jul 25, 2016

amueller commented Jul 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amueller commented Jul 26, 2016 • edited Loading

Choose a reason for hiding this comment

raghavrv commented Jul 26, 2016

amueller commented Jul 26, 2016

raghavrv commented Jul 26, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jnothman commented Jul 27, 2017 via email

raghavrv commented Aug 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

raghavrv commented Aug 9, 2017

ogrisel commented Aug 9, 2017

raghavrv commented Jul 24, 2016 •

edited

Loading

amueller commented Jul 26, 2016 •

edited

Loading