Skip to content

Conversation

sply88
Copy link
Contributor

@sply88 sply88 commented Nov 9, 2021

Speeds up ../examples/ensemble/plot_gradient_boosting_regularization.py (Issue #21598) by

  • reducing number of samples in train and test datasets from 2000 to 1500
  • reducing n_estimators from 1000 to 600

Reduction of n_estimators is compensated by increasing the learning rate from 0.1 to 0.2 (for models with shrinkage).

For me example runs in 13 sec now (previously plus 30).

Main message of final figure does not change:
image

@sply88 sply88 changed the title accelerate plot_gradient_boosting_regularization.py example #21598 [MRG] accelerate plot_gradient_boosting_regularization.py example #21598 Nov 9, 2021
@adrinjalali adrinjalali mentioned this pull request Nov 10, 2021
41 tasks
@adrinjalali
Copy link
Member

If the final output hasn't changed, we may be able to push further and speed up the example even more. Thanks for the work @sply88

@sply88
Copy link
Contributor Author

sply88 commented Nov 11, 2021

Original figure in example looks like this:
image

Could speed it up a bit more to around 9s by only using 400 boosting iterations. So the x-Axis of the figure in my original PR comment would end at 400 and the yellow and blue lines would not cross anymore. I don't think this would be a big issue because it would still be obvious that shrinkage is good and no-shrinkage (e.g. blue and yellow lines) is bad.
What do you think @adrinjalali?

Comment on lines 41 to 42
X_train, X_test = X[:1500], X[1500:]
y_train, y_test = y[:1500], y[1500:]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also try reducing the number of samples in the make_hastie_10_2 line above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have reduced both number of samples and number of estimators to get down to 5s.
Output below. Main message is still obvious I think.
image

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @sply88 !

@@ -32,17 +32,17 @@
from sklearn import datasets


X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
X, y = datasets.make_hastie_10_2(n_samples=3000, random_state=1)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of setting the following settings?

from sklearn.model_selection import train_test_split

X, y = datasets.make_hastie_10_2(n_samples=4000, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)

original_params = {
    "n_estimators": 400,
    ...
}

It looks like it keeps a very similar message as the original:

Figure_1

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update @sply88 !

LGTM

@adrinjalali adrinjalali merged commit f19bf4c into scikit-learn:main Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
…t-learn#21598 (scikit-learn#21611)

* accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598

* speed up by less samples and less trees

* use train_test_split instead of slicing
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
…t-learn#21598 (scikit-learn#21611)

* accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598

* speed up by less samples and less trees

* use train_test_split instead of slicing
glemaitre pushed a commit that referenced this pull request Dec 25, 2021
#21611)

* accelerate plot_gradient_boosting_regularization.py example #21598

* speed up by less samples and less trees

* use train_test_split instead of slicing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants