Skip to content

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Aug 9, 2017

Conversation

raghavrv
Copy link
Member

@raghavrv raghavrv commented Jul 24, 2016

Finishing up @vighneshbirodkar's #5689 (Also refer #1036)

Enables early stopping to gradient boosted models via new parameters n_iter_no_change, validation_fraction, tol.

(This takes inspiration from our MLPClassifier)

This has been rewritten after IRL discussions with @agramfort and @ogrisel.

@amueller @agramfort @MechCoder @vighneshbirodkar @ogrisel @glouppe @pprett

mean = np.mean(scores)
params['n_estimators'] = int(np.mean(n_est_list))
score_tuple = _CVScoreTuple(params, mean, scores)
grid_scores.append(_CVScoreTuple(params, mean, scores))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller @jnothman Should we use GridSearchCV like results_ here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe. We should not use _CVScoreTuple, we should try to get rid of that one. The other EstimatorCV models implement scores_, right? I feel like they should all have a consistent interface.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A consistent interface that is like GridSearchCV's results_?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'd like to see a consistent interface like GridSearchCV's results, and always had that in mind. I don't suppose we'll see it in 0.18. Sorry I wasn't fastidious enough to post this as a new issue months ago.

@raghavrv
Copy link
Member Author

Also any advice on what kind of tests I should add here?

@raghavrv
Copy link
Member Author

And finally would like to know

  • If we need the monitor callable at fit of GradientBoosting*CV too? Would that be redundant with the stop_rounds param?
  • Should we port the stop_rounds to GradientBoostingClassifier/Regressor too?

@raghavrv raghavrv force-pushed the gbcv branch 3 times, most recently from ba42a94 to 5d742df Compare July 26, 2016 14:08
@amueller
Copy link
Member

tests: that this does the same as naive grid-search and that it stops early when it should.

@@ -383,6 +383,7 @@ Samples generator
ensemble.ExtraTreesRegressor
ensemble.GradientBoostingClassifier
ensemble.GradientBoostingRegressor
ensemble.GradientBoostingClassifierCV
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RegressorCV also

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added below

@amueller
Copy link
Member

amueller commented Jul 26, 2016

n_iter_combo is not documented. I'd really not implement randomized search here and wait until we have the right API with warm starts to do that with arbitrary search methods.

for train, validation in cv.split(X)
for params in param_iter)

n_splits = int(len(out)/len(param_iter))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pep8, also //

@raghavrv
Copy link
Member Author

@amueller could you enter captcha before reviewing? :p

@amueller
Copy link
Member

I remember distinctly that you asked for a review yesterday ;)

@raghavrv
Copy link
Member Author

Thanks a tonne for looking into the PR. I'll address them and update you!

gb.n_estimators = i
gb.fit(X_train, y_train)

scores = np.roll(scores, shift=-1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems an odd way of doing this. Why not append to a list?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a way to retain the last 10 scores. Hence a shift register kinda setup.

via ``n_iter_no_change``, ``validation_fraction`` and `tol`. :issue:`7071`
by `Raghav RV`_

- Added :class:`multioutput.ClassifierChain` for multi-label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this is in master. I don't why it shows in the diff.

@@ -81,6 +81,13 @@ New features

Classifiers and regressors

- :class:`ensemble.GradientBoostingClassifier` and
:class:`ensemble.GradientBoostingRegressor` now support early stopping
via ``n_iter_no_change``, ``validation_fraction`` and `tol`. :issue:`7071`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did tol only get single backticks? ;)

@jnothman
Copy link
Member

jnothman commented Jul 27, 2017 via email

@raghavrv
Copy link
Member Author

raghavrv commented Aug 9, 2017

@jnothman Have moved the whatsnew entry into a new section - 0.20

This example illustrates how the early stopping can used in the
:class:`sklearn.ensemble.GradientBoostingClassifier` model to achieve
almost the same accuracy as compared to a model built without early stopping
using many fewer estimators. This can save memory and prediction time.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can significantly reduce training time, memory usage and prediction latency.

early stopping. Must be between 0 and 1.
Only used if ``n_iter_no_change`` is set to an integer.

.. versionadded:: 0.19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be changed to 0.20.

early stopping. Must be between 0 and 1.
Only used if early_stopping is True

.. versionadded:: 0.19
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment for this class docstring: 0.20.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM besides small comments.

def test_gradient_boosting_validation_fraction():
X, y = make_classification(n_samples=1000, random_state=0)

gbc = GradientBoostingClassifier(n_estimators=100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment by @jnothman has not been addressed.

def test_gradient_boosting_validation_fraction():
X, y = make_classification(n_samples=1000, random_state=0)

gbc = GradientBoostingClassifier(n_estimators=100,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still use nosetest in travis and appveyor at this time though.

@raghavrv
Copy link
Member Author

raghavrv commented Aug 9, 2017

image

@ogrisel ogrisel merged commit 312b1df into scikit-learn:master Aug 9, 2017
@ogrisel
Copy link
Member

ogrisel commented Aug 9, 2017

Merged. Thanks for this nice contribution @raghavrv!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants