-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FEA Turn on early stopping in histogram GBDT by default #14516
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FEA Turn on early stopping in histogram GBDT by default #14516
Conversation
It looks like Most common errors:
This error also occurs one time in
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for giving this a shot @johannfaouzi.
Please add a simple parametrized test that makes sure ES is enabled by default.
Regarding the estimator checks failures, I think the simplest way to go for now is to update set_checking_parameters
in utils/estimator_checks.py
and deactivate early stopping here.
Thanks for the help regarding test_estimators!
Is test_early_stopping_default not enough? I added this small test in test_gradient_boosting.py
… Le 30 juil. 2019 à 20:42, Nicolas Hug ***@***.***> a écrit :
@NicolasHug commented on this pull request.
Thanks for giving this a shot @johannfaouzi.
Please add a simple parametrized test that makes sure ES is enabled by default.
Regarding the estimator checks failures, I think the simplest way to go for now is to set_checking_parameters in utils/estimator_checks.py and deactivate early stopping here.
In sklearn/ensemble/_hist_gradient_boosting/gradient_boosting.py:
> @@ -622,7 +622,9 @@ class HistGradientBoostingRegressor(BaseHistGradientBoosting, RegressorMixin):
for big datasets (n_samples >= 10 000). The input data ``X`` is pre-binned
into integer-valued bins, which considerably reduces the number of
splitting points to consider, and allows the algorithm to leverage
- integer-based data structures. For small sample sizes,
+ integer-based data structures. Early stopping is the default behavior, as
I'd rather have this after or at the end of this paragraph
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpic but LGTM.
Thanks @johannfaouzi !!
So the new behavior is that early stopping is enabled with n_iter_no_change=10
, validation_fraction
is .1 and the scorer is the estimator's default scorer. I'll make sure to document this in the user guide in #14525 .
sklearn/ensemble/_hist_gradient_boosting/tests/test_gradient_boosting.py
Outdated
Show resolved
Hide resolved
@johannfaouzi , we're planning to add support for sample weights. It's a non trivial change since that would require changes pretty much everywhere in the GBDT code, including (and mostly) in Cython. We'd be happy to provide guidance if that's something you'd like to work on? I'd definitely tag this as a hard PR, but it could be a great learning experience ;). LMK. |
@adrinjalali might want to review this? |
Thanks for your proposal! Unfortunately I think that it would be a bit too time-consuming for me, as I have never written a single line of Cython (I am a big fan of |
No problem!
Ha, I don't see that happening anytime soon. When we were developing pygbm I found numba to be fairly unstable compared to Cython. And as far as I know they still don't support compilation caching which is mandatory for us (pygbm takes about 10 seconds to compile every single time you run it). And I'm not even talking about the fact that we don't like adding dependencies ^^ |
There has been a cache option for a while, so it's probably not what you mean (but I clearly lack knowledge in that part of IT to clearly understand the difference).
That's what I thought :) |
@NicolasHug scipy is going to add numba as a dependency soon(ish), so sklearn will have it by default... at least if Ralf get's his way ;) |
(HistGradientBoostingRegressor(random_state=0), 'recursion')] | ||
(HistGradientBoostingRegressor(random_state=0, n_iter_no_change=None), | ||
'brute'), | ||
(HistGradientBoostingRegressor(random_state=0, n_iter_no_change=None), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would the test fail with early stopping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are very small training sets where early stopping leads to bad results
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the dataset has 100 samples (the default value for our make_regression
), right? I kinda feel like the default parameters should give reasonable results for the default parameters of our data generators. Would that be too hard to have an auto
early stopping parameter to figure if it's a good idea or not?
I really miss having a nice and more dynamic early stopping parameter, like a policy, a function, or an object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I remember a dataset with 40 samples. Not sure.
Would that be too hard to have an auto early stopping parameter to figure if it's a good idea or not?
I don't think that's the place for this discussion.
See also the latest comments on #11324
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tests fails with early stopping, otherwise I would not have changed this (I didn't know about the inspection
module before). I can undo the changes locally to see where it fails exactly if you are interested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug I think this is a relevant place to discuss this here, because clearly we're changing our defaults in a way that makes the defaults break on a lot of our examples. That kind of brings into question whether that's actually an improvement. Having to change the doctests to adhere to the lower accuracies is not a good sign either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In other words, I think this has less to do with how to test, and more with what good default settings are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I meant to say that if we want to have a discussion around reworking our early-stopping mechanism, we should probably open an issue for it.
I'm not particularly uncomfortable with estimators not passing our common tests with the default values. I don't even think we can achieve that.
Take the first failing test, check_estimators_dtypes
. This one has a dataset with n_samples=20
and 3 classes. The default for validation_fraction
is .1 which leads to test_size=2, so train_test_split
fails with The test_size = 2 should be greater or equal to the number of classes = 3
What is the right course of action here? I don't think it's worth changing the default, or having a "clever" 'auto' value just for this test to pass. It's not worth changing the test either. This would be another form of the VW thing that you mentionned before. The test isn't even remotely related to accuracy/overfitting (so it's not related to early stopping either: there's no harm in disabling it)
Otherwise LGTM |
As mentioned in the docstrings, HistGBM is not really suited for small datasets. However, the tests use relatively small datasets (it makes them faster). Moreover, on small datasets with early stopping, the validation set is very small and makes the variance of the performance evaluation quite high. Maybe we should add a new parameter
1000 is just a placeholder, I don't know what a good value could be.
|
I would agree with an |
I'm OK with an 'auto' default but I personally have no idea what a sensible default would be. Maybe @ogrisel would have some ideas. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @johannfaouzi , still LGTM.
The test_warm_start_early_stopping
should be updated to:
- set
early_stopping=True
- change the assert to
assert 0 < n_iter_second_fit - n_iter_first_fit < n_iter_no_change
to avoid the test succeeding for the wrong reason (like it is right now)
Could this get some love maybe @ogrisel @adrinjalali @glemaitre ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR somehow exploded, and it was partly my fault 🙈
I like the changes now. Do we test for effectiveness of it somehow? i.e. do we check if it improves the performance?
doc/whats_new/v0.23.rst
Outdated
- |Feature| :func:`inspection.partial_dependence` and | ||
:func:`inspection.plot_partial_dependence` now support the fast 'recursion' | ||
method for both estimators. :pr:`13769` by `Nicolas Hug`_. | ||
|
||
:mod:`sklearn.feature_extraction` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O_o
@@ -710,21 +712,25 @@ class HistGradientBoostingRegressor(RegressorMixin, BaseHistGradientBoosting): | |||
and add more estimators to the ensemble. For results to be valid, the | |||
estimator should be re-trained on the same data only. | |||
See :term:`the Glossary <warm_start>`. | |||
scoring : str or callable or None, optional (default=None) | |||
early_stopping : 'auto' or bool (default='auto') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
early_stopping : 'auto' or bool (default='auto') | |
early_stopping : 'auto' or bool, default='auto' |
but also happy to have all of them fixed in another PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we should probably have the whole file fixed in another PR since all docstrings follow the old style rn
Benchmarks start here #14516 (comment) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yeah now I remember, at the end there wasn't something which we could add to the tests easily.
Thanks @adrinjalali Codecov seems unhappy but it's giving me a 500 @ogrisel I'll merge in the following days unless you have any additional comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think that we could move forward @NicolasHug
Yup thanks for the reminder! |
Thanks for the nice work @johannfaouzi ! |
Thanks @johannfaouzi |
Reference Issues/PRs
Fixes #14503.
What does this implement/fix? Explain your changes.
This PR enables early stopping by default in HGBM:
n_iter_no_change=10
instead ofn_iter_no_change=None
(not sure about this value, I looked at Should we turn on early stopping in HistGradientBoosting by default? #14303)Early stopping is the default behavior, as it usually makes the fitting process much faster without a substantial difference in terms of predictive performance.
random_state
has to be fixed in the examples (for the cross-validation), because the performance (clf.score(X, y)
) depends on the splitting.n_iter_no_change=None
has been added to some existing tests, so that they can check the expected behavior.