Skip to content

[DOC] Speed up plot_gradient_boosting_quantile.py example #21666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

marenwestermann
Copy link
Member

Reference Issues/PRs

Addresses #21598

What does this implement/fix? Explain your changes.

Speeds up ../examples/ensemble/plot_gradient_boosting_quantile.py. On my computer the runtime improved from 78 seconds to 54 seconds. The section that makes this example slow is the grid search in the section "Tuning the hyper-parameters of the quantile regressors". I speeded up this section by changing and removing parameters in the grid search. I adjusted the documentation accordingly because the results now differ from the previous results. This module could be speeded up a lot by removing the grid search completely but I don't know if this is desirable.

Any other comments?

@adrinjalali
Copy link
Member

@lorentzenchr it'd be nice if we could reduce the time of this example much further, but I'm kinda out of ideas.

@lorentzenchr
Copy link
Member

lorentzenchr commented Nov 15, 2021

@adrinjalali I don't know if you want to hear this one: Merge #20567 and follow-up PRs, then use HistGradientBoostingRegressor😉
A possibility worth a try could be using HalvingRandomSearchCV instead of RandomizedSearchCV.
TBH, I'm fascinated by the fact that quantile estimation seems to be a hard problem in general compared to estimation of the expectation.

Edit: Meanwhile, #20567 is merged and the follow-up for quantile HGBT is #21800.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes to gain speed are good. I would, however, put back the information on different optimal tree depth for the different quantiles.

Comment on lines 265 to 267
# We observe that the search procedure identifies that deeper trees are needed
# to get a good fit for the 5th percentile regressor. Deeper trees are more
# expressive and less likely to underfit.
# We observe that the hyper-parameters that were hand-tuned for the median
# regressor are in the same range as the hyper-parameters suitable for the 5th
# percentile regressor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same message anymore.

Comment on lines 289 to 294
# This time, shallower trees are selected and lead to a more constant piecewise
# and therefore more robust estimation of the 95th percentile. This is
# beneficial as it avoids overfitting the large outliers of the log-normal
# additive noise.
#
# We can confirm this intuition by displaying the predicted 90% confidence
# interval comprised by the predictions of those two tuned quantile regressors:
# the prediction of the upper 95th percentile has a much coarser shape than the
# prediction of the lower 5th percentile:
# The result shows that the hyper-parameters for the 95th percentile regressor
# identified by the grid search are roughly in the same range as the hand-
# tuned hyper-parameters for the median regressor and the hyper-parameters
# identified by the grid search for the 5th percentile regressor. However, the
# hyper-parameter grid searches did lead to an improved 90% confidence
# interval which can be seen below:
Copy link
Member

@lorentzenchr lorentzenchr Nov 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change also loses some, in my opinion, valuable information.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue here is that the original hyperparameter space didn't include 0.2 learning rate, and therefore the statement was true, and expanding on the learning rate space, makes the example faster, and also makes it actually choose shallow trees in the first place. So I'm not sure if the information provided was kinda true in the first place.

Or do you mean something else @lorentzenchr ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the last plot, it's visible that the lower (5%) quantile is much more fine grained than the upper (95%) quantile. This statement, however, is less obvious from the new tuned parameters. I would at least comment on the plot.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the tree depth: yes, based on my results deeper trees are actually not needed to achieve a good fit for the 5th percentile regressor. This is why I changed the documentation.
Regarding the granularity of the 5th and 95th percentile: true, with my changes the documentation on this gets lost. I'll change this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll also try using HalvingRandomSearchCV as suggested and will report back my results.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@marenwestermann Do you need any help or just more time?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorentzenchr Thank you for checking in! I was moving house in the meantime. :) I addressed the comments, let me know if you would like more changes.

Maren Westermann added 2 commits November 30, 2021 17:08
@marenwestermann
Copy link
Member Author

I was able to reduce the runtime to 23 seconds on my computer (from initially 78 seconds) using HalvingRandomSearchCV.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lorentzenchr
Copy link
Member

I was able to reduce the runtime to 23 seconds on my computer (from initially 78 seconds) using HalvingRandomSearchCV.

That's great.
Does someone know if the CI failure is related to this PR?

@adrinjalali
Copy link
Member

Unrelated, merging latest main would fix the issue.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM
@marenwestermann Thank you for this work. Could you merge main and if CI is green I'll be happy to merge.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work and nice improvement!

@ogrisel ogrisel merged commit cc534f8 into scikit-learn:main Dec 2, 2021
@marenwestermann marenwestermann deleted the gradient-boosting-quantile branch December 3, 2021 09:28
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
…earn#21666)



Co-authored-by: Maren Westermann <maren.westermann@free-now.com>
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
glemaitre pushed a commit that referenced this pull request Dec 25, 2021

Co-authored-by: Maren Westermann <maren.westermann@free-now.com>
Co-authored-by: Tom Dupré la Tour <tom.dupre-la-tour@m4x.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants