[DOC] Speed up `plot_gradient_boosting_quantile.py` example #21666

marenwestermann · 2021-11-14T16:34:46Z

Reference Issues/PRs

Addresses #21598

What does this implement/fix? Explain your changes.

Speeds up ../examples/ensemble/plot_gradient_boosting_quantile.py. On my computer the runtime improved from 78 seconds to 54 seconds. The section that makes this example slow is the grid search in the section "Tuning the hyper-parameters of the quantile regressors". I speeded up this section by changing and removing parameters in the grid search. I adjusted the documentation accordingly because the results now differ from the previous results. This module could be speeded up a lot by removing the grid search completely but I don't know if this is desirable.

Any other comments?

adrinjalali · 2021-11-15T14:27:35Z

@lorentzenchr it'd be nice if we could reduce the time of this example much further, but I'm kinda out of ideas.

lorentzenchr · 2021-11-15T20:38:57Z

@adrinjalali I don't know if you want to hear this one: Merge #20567 and follow-up PRs, then use HistGradientBoostingRegressor😉
A possibility worth a try could be using HalvingRandomSearchCV instead of RandomizedSearchCV.
TBH, I'm fascinated by the fact that quantile estimation seems to be a hard problem in general compared to estimation of the expectation.

Edit: Meanwhile, #20567 is merged and the follow-up for quantile HGBT is #21800.

lorentzenchr

The changes to gain speed are good. I would, however, put back the information on different optimal tree depth for the different quantiles.

lorentzenchr · 2021-11-19T19:30:08Z

examples/ensemble/plot_gradient_boosting_quantile.py

-# We observe that the search procedure identifies that deeper trees are needed
-# to get a good fit for the 5th percentile regressor. Deeper trees are more
-# expressive and less likely to underfit.
+# We observe that the hyper-parameters that were hand-tuned for the median
+# regressor are in the same range as the hyper-parameters suitable for the 5th
+# percentile regressor


This is not the same message anymore.

lorentzenchr · 2021-11-19T19:30:14Z

examples/ensemble/plot_gradient_boosting_quantile.py

-# This time, shallower trees are selected and lead to a more constant piecewise
-# and therefore more robust estimation of the 95th percentile. This is
-# beneficial as it avoids overfitting the large outliers of the log-normal
-# additive noise.
-#
-# We can confirm this intuition by displaying the predicted 90% confidence
-# interval comprised by the predictions of those two tuned quantile regressors:
-# the prediction of the upper 95th percentile has a much coarser shape than the
-# prediction of the lower 5th percentile:
+# The result shows that the hyper-parameters for the 95th percentile regressor
+# identified by the grid search are roughly in the same range as the hand-
+# tuned hyper-parameters for the median regressor and the hyper-parameters
+# identified by the grid search for the 5th percentile regressor. However, the
+# hyper-parameter grid searches did lead to an improved 90% confidence
+# interval which can be seen below:


This change also loses some, in my opinion, valuable information.

I think the issue here is that the original hyperparameter space didn't include 0.2 learning rate, and therefore the statement was true, and expanding on the learning rate space, makes the example faster, and also makes it actually choose shallow trees in the first place. So I'm not sure if the information provided was kinda true in the first place.

Or do you mean something else @lorentzenchr ?

In the last plot, it's visible that the lower (5%) quantile is much more fine grained than the upper (95%) quantile. This statement, however, is less obvious from the new tuned parameters. I would at least comment on the plot.

Regarding the tree depth: yes, based on my results deeper trees are actually not needed to achieve a good fit for the 5th percentile regressor. This is why I changed the documentation.
Regarding the granularity of the 5th and 95th percentile: true, with my changes the documentation on this gets lost. I'll change this.

I'll also try using HalvingRandomSearchCV as suggested and will report back my results.

@marenwestermann Do you need any help or just more time?

@lorentzenchr Thank you for checking in! I was moving house in the meantime. :) I addressed the comments, let me know if you would like more changes.

…nn/scikit-learn into gradient-boosting-quantile

marenwestermann · 2021-11-30T16:14:27Z

I was able to reduce the runtime to 23 seconds on my computer (from initially 78 seconds) using HalvingRandomSearchCV.

adrinjalali

thanks @marenwestermann

lorentzenchr · 2021-11-30T17:09:47Z

I was able to reduce the runtime to 23 seconds on my computer (from initially 78 seconds) using HalvingRandomSearchCV.

That's great.
Does someone know if the CI failure is related to this PR?

adrinjalali · 2021-11-30T17:10:57Z

Unrelated, merging latest main would fix the issue.

lorentzenchr

LGTM
@marenwestermann Thank you for this work. Could you merge main and if CI is green I'll be happy to merge.

…uantile

ogrisel

Great work and nice improvement!

…earn#21666) Co-authored-by: Maren Westermann <maren.westermann@free-now.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>

Co-authored-by: Maren Westermann <maren.westermann@free-now.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>

speed up grid search

4f0247a

github-actions bot added the Documentation label Nov 14, 2021

adrinjalali mentioned this pull request Nov 15, 2021

Accelerate slow examples #21598

Closed

41 tasks

adrinjalali requested a review from lorentzenchr November 15, 2021 14:25

reshamas added the Sprint label Nov 16, 2021

Add a dot and trigger CirclCI build

b0a8969

lorentzenchr reviewed Nov 19, 2021

View reviewed changes

Maren Westermann added 2 commits November 30, 2021 17:08

address comments

f9fc83b

Merge branch 'gradient-boosting-quantile' of github.com:marenwesterma…

d333e3b

…nn/scikit-learn into gradient-boosting-quantile

adrinjalali approved these changes Nov 30, 2021

View reviewed changes

lorentzenchr approved these changes Dec 1, 2021

View reviewed changes

Merge remote-tracking branch 'upstream/main' into gradient-boosting-q…

22551be

…uantile

ogrisel approved these changes Dec 2, 2021

View reviewed changes

ogrisel merged commit cc534f8 into scikit-learn:main Dec 2, 2021

marenwestermann deleted the gradient-boosting-quantile branch December 3, 2021 09:28

glemaitre pushed a commit that referenced this pull request Dec 25, 2021

[DOC] Speed up plot_gradient_boosting_quantile.py example (#21666)

b7cf0de

Co-authored-by: Maren Westermann <maren.westermann@free-now.com> Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DOC] Speed up `plot_gradient_boosting_quantile.py` example #21666

[DOC] Speed up `plot_gradient_boosting_quantile.py` example #21666

marenwestermann commented Nov 14, 2021

adrinjalali commented Nov 15, 2021

lorentzenchr commented Nov 15, 2021 •

edited

Loading

lorentzenchr left a comment

lorentzenchr Nov 19, 2021

lorentzenchr Nov 19, 2021 •

edited

Loading

adrinjalali Nov 23, 2021

lorentzenchr Nov 23, 2021

marenwestermann Nov 25, 2021

marenwestermann Nov 25, 2021

lorentzenchr Nov 29, 2021

marenwestermann Nov 30, 2021

marenwestermann commented Nov 30, 2021

adrinjalali left a comment

lorentzenchr commented Nov 30, 2021

adrinjalali commented Nov 30, 2021

lorentzenchr left a comment

ogrisel left a comment

[DOC] Speed up plot_gradient_boosting_quantile.py example #21666

[DOC] Speed up plot_gradient_boosting_quantile.py example #21666

Conversation

marenwestermann commented Nov 14, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

adrinjalali commented Nov 15, 2021

lorentzenchr commented Nov 15, 2021 • edited Loading

lorentzenchr left a comment

Choose a reason for hiding this comment

lorentzenchr Nov 19, 2021

Choose a reason for hiding this comment

lorentzenchr Nov 19, 2021 • edited Loading

Choose a reason for hiding this comment

adrinjalali Nov 23, 2021

Choose a reason for hiding this comment

lorentzenchr Nov 23, 2021

Choose a reason for hiding this comment

marenwestermann Nov 25, 2021

Choose a reason for hiding this comment

marenwestermann Nov 25, 2021

Choose a reason for hiding this comment

lorentzenchr Nov 29, 2021

Choose a reason for hiding this comment

marenwestermann Nov 30, 2021

Choose a reason for hiding this comment

marenwestermann commented Nov 30, 2021

adrinjalali left a comment

Choose a reason for hiding this comment

lorentzenchr commented Nov 30, 2021

adrinjalali commented Nov 30, 2021

lorentzenchr left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

[DOC] Speed up `plot_gradient_boosting_quantile.py` example #21666

[DOC] Speed up `plot_gradient_boosting_quantile.py` example #21666

lorentzenchr commented Nov 15, 2021 •

edited

Loading

lorentzenchr Nov 19, 2021 •

edited

Loading