-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG] Fast PDPs for histogram-based GBDT #13769
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Fast PDPs for histogram-based GBDT #13769
Conversation
Ping me when this is ready to be reviewed ;-) |
…ikit-learn into fast_partial_dep_hist_gbdt
Ready @glemaitre ;) |
I have pushed an updated example to use the HistGradientBoosting regressor in the PDP example. I have also fixed the example to:
Here are the results: You can observe that the Neural Net PDP now agrees with the GBRT PDB for all the features. It was not the case with a weaker model. |
thanks!! |
The new plots match Figure 10.16 (p 374) of ESL II even better than the PDP plots of the non-histogram based GBRT. |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here a bunch of minor comments:
In the plots of the example above, I find it weird that the |
I found the cause of the shift: the offset in y is supposed to happen before the train / test split, otherwise it's either not taken into account or the r2 score cannot be interpreted easily :) |
This reverts commit f0f8641. Actually viridis is already good enough on recent matplotlib versions and we want to continue supporting older matplotlib versions.
Here is the rendering of the example: One can see that the links to the from sklearn.experimental import enable_hist_gradient_boosting # noqa is present in The same problem appears in the API table of contents: https://58079-843222-gh.circle-artifacts.com/0/doc/modules/classes.html#module-sklearn.ensemble |
This is probably caused by #13824 that was merged to master concurrently. Let me try to merge master and fix in this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I addressed my own nitpicks. LGTM.
…st_partial_dep_hist_gbdt
…ikit-learn into fast_partial_dep_hist_gbdt
…st_partial_dep_hist_gbdt
I believe this PR is ready to merge. @glemaitre any further comment? |
I find the first plot a bit small: Would it be better to have 2 separate figures? Otherwise LGTM. |
They already are two images. It's sphinx-gallery that's displaying them side by side. But I agree this could be improved by having two code blocks: one for MLPRegressor and one for GBRT. Each figure would appear under the matching code block and that should improve readability. |
…st_partial_dep_hist_gbdt
made 2 code blocks, plots are now bigger |
Can you use tight layout or constraint layout to make the ylabels not overlap the plots? |
The new one looks slightly better, though it's hard to find something that works both for local and sphinx plots. |
The rendering is ok for now. Thanks @NicolasHug |
This PR implements fast partial dependence computation for the new histogram-based GBDTs.
Both
BaseGradientBoosting
andBaseHistGradientBoosting
now have a_compute_partial_dependence_recursion()
method.The cython code for computing PDPs of the histogram-based predictors is very similar to that of the regular trees.