-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG] accelerate plot_gradient_boosting_regularization.py example #21598 #21611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] accelerate plot_gradient_boosting_regularization.py example #21598 #21611
Conversation
If the final output hasn't changed, we may be able to push further and speed up the example even more. Thanks for the work @sply88 |
Original figure in example looks like this: Could speed it up a bit more to around 9s by only using 400 boosting iterations. So the x-Axis of the figure in my original PR comment would end at 400 and the yellow and blue lines would not cross anymore. I don't think this would be a big issue because it would still be obvious that shrinkage is good and no-shrinkage (e.g. blue and yellow lines) is bad. |
X_train, X_test = X[:1500], X[1500:] | ||
y_train, y_test = y[:1500], y[1500:] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could also try reducing the number of samples in the make_hastie_10_2
line above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @sply88 !
@@ -32,17 +32,17 @@ | |||
from sklearn import datasets | |||
|
|||
|
|||
X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1) | |||
X, y = datasets.make_hastie_10_2(n_samples=3000, random_state=1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think of setting the following settings?
from sklearn.model_selection import train_test_split
X, y = datasets.make_hastie_10_2(n_samples=4000, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)
original_params = {
"n_estimators": 400,
...
}
It looks like it keeps a very similar message as the original:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the update @sply88 !
LGTM
…t-learn#21598 (scikit-learn#21611) * accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598 * speed up by less samples and less trees * use train_test_split instead of slicing
…t-learn#21598 (scikit-learn#21611) * accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598 * speed up by less samples and less trees * use train_test_split instead of slicing
Speeds up ../examples/ensemble/plot_gradient_boosting_regularization.py (Issue #21598) by
n_estimators
from 1000 to 600Reduction of
n_estimators
is compensated by increasing the learning rate from 0.1 to 0.2 (for models with shrinkage).For me example runs in 13 sec now (previously plus 30).
Main message of final figure does not change:
