[MRG] Fast PDPs for histogram-based GBDT #13769

NicolasHug · 2019-05-02T15:43:15Z

This PR implements fast partial dependence computation for the new histogram-based GBDTs.

Both BaseGradientBoosting and BaseHistGradientBoosting now have a _compute_partial_dependence_recursion() method.

The cython code for computing PDPs of the histogram-based predictors is very similar to that of the regular trees.

glemaitre · 2019-05-02T20:29:12Z

Ping me when this is ready to be reviewed ;-)

…ikit-learn into fast_partial_dep_hist_gbdt

NicolasHug · 2019-05-03T20:11:05Z

Ready @glemaitre ;)

ogrisel · 2019-05-09T15:49:44Z

I have pushed an updated example to use the HistGradientBoosting regressor in the PDP example.

I have also fixed the example to:

remove the main() function that is no longer needed with loky;
put the analysis inside the code;
make the MLPRegressor reach close to .82 R2 score to be comparable with the GBRT model (this requires Quantile based feature scaling + a deeper model + early stopping to avoid long fit times).

Here are the results:

You can observe that the Neural Net PDP now agrees with the GBRT PDB for all the features. It was not the case with a weaker model.

NicolasHug · 2019-05-09T15:54:56Z

thanks!!

ogrisel · 2019-05-09T15:56:12Z

The new plots match Figure 10.16 (p 374) of ESL II even better than the PDP plots of the non-histogram based GBRT.

ogrisel · 2019-05-09T16:28:29Z

The test_fastica_simple failure is unrelated. I will open a PR to make it deterministic.

ogrisel

Here a bunch of minor comments:

sklearn/ensemble/_hist_gradient_boosting/_predictor.pyx

sklearn/inspection/partial_dependence.py

sklearn/ensemble/gradient_boosting.py

sklearn/ensemble/_hist_gradient_boosting/predictor.py

ogrisel · 2019-05-09T16:59:36Z

In the plots of the example above, I find it weird that the MLPRegressor partial dependence values are all shifted by +2 w.r.t. the GBRT values. If I switch to method="brute" I also get the shift by +2.

ogrisel · 2019-05-09T17:30:17Z

I found the cause of the shift: the offset in y is supposed to happen before the train / test split, otherwise it's either not taken into account or the r2 score cannot be interpreted easily :)

This reverts commit f0f8641. Actually viridis is already good enough on recent matplotlib versions and we want to continue supporting older matplotlib versions.

ogrisel · 2019-05-10T07:07:49Z

Here is the rendering of the example:

https://58079-843222-gh.circle-artifacts.com/0/doc/auto_examples/inspection/plot_partial_dependence.html#sphx-glr-auto-examples-inspection-plot-partial-dependence-py

One can see that the links to the sklearn.ensemble.HistGradientBoostingRegressor class do not work because of the experimental setup. This is unexpected because the line:

from sklearn.experimental import enable_hist_gradient_boosting  # noqa

is present in doc/conf.py.

The same problem appears in the API table of contents:

https://58079-843222-gh.circle-artifacts.com/0/doc/modules/classes.html#module-sklearn.ensemble

ogrisel · 2019-05-10T07:31:07Z

This is probably caused by #13824 that was merged to master concurrently. Let me try to merge master and fix in this PR.

ogrisel

I addressed my own nitpicks. LGTM.

…st_partial_dep_hist_gbdt

…ikit-learn into fast_partial_dep_hist_gbdt

…st_partial_dep_hist_gbdt

ogrisel · 2019-07-03T21:58:15Z

I believe this PR is ready to merge. @glemaitre any further comment?

glemaitre · 2019-07-03T22:10:09Z

I find the first plot a bit small:
https://62011-843222-gh.circle-artifacts.com/0/doc/auto_examples/inspection/plot_partial_dependence.html#sphx-glr-auto-examples-inspection-plot-partial-dependence-py

Would it be better to have 2 separate figures?

Otherwise LGTM.

ogrisel · 2019-07-03T22:25:20Z

They already are two images. It's sphinx-gallery that's displaying them side by side. But I agree this could be improved by having two code blocks: one for MLPRegressor and one for GBRT. Each figure would appear under the matching code block and that should improve readability.

…st_partial_dep_hist_gbdt

NicolasHug · 2019-07-05T13:32:15Z

https://63577-843222-gh.circle-artifacts.com/0/doc/auto_examples/inspection/plot_partial_dependence.html

made 2 code blocks, plots are now bigger

amueller · 2019-07-05T16:18:07Z

Can you use tight layout or constraint layout to make the ylabels not overlap the plots?

NicolasHug · 2019-07-05T18:09:24Z

The new one looks slightly better, though it's hard to find something that works both for local and sphinx plots.

glemaitre · 2019-07-11T12:53:17Z

The rendering is ok for now. Thanks @NicolasHug

NicolasHug added 3 commits April 27, 2019 15:47

fast partial dep cleaning

290669d

minor pep8

cd1e4a3

WIP

028beda

NicolasHug added 8 commits May 2, 2019 19:15

Merge branch 'master' into fast_partial_dep_hist_gbdt

72269fc

Merge branch 'master' into fast_partial_dep_hist_gbdt

e54fe79

tests and fixes

444c4f6

docstrings

cd7d64d

docstrings

b60120c

Merge branch 'fast_partial_dep_hist_gbdt' of github.com:NicolasHug/sc…

52ee07b

…ikit-learn into fast_partial_dep_hist_gbdt

pep8

4ec6818

more docs

a16007b

NicolasHug changed the title ~~[WIP] Fast PDPs for histogram-based GBDT~~ [MRG] Fast PDPs for histogram-based GBDT May 3, 2019

glemaitre self-requested a review May 6, 2019 09:44

ogrisel added 2 commits May 9, 2019 16:57

Use fast gradient boosting in PDP example

ed06695

Better MLP, better notebook layout

dc5f944

This was referenced May 9, 2019

FastICA unit test fails at random #1349

Closed

[MRG] Better fix the rng seed in test_fastica_simple #13848

Merged

ogrisel reviewed May 9, 2019

View reviewed changes

Fix shift in y in example

03308bd

ogrisel added 3 commits May 9, 2019 20:59

Small example reorg

341bcc3

Make the example run faster without changing too much the plots

18862a4

Avoid oversubscription in Circle CI docker container

369cd5d

ogrisel added 3 commits May 9, 2019 23:32

One more tweak to the example

879147f

Better colormap for the 2D interaction plot

f0f8641

Revert "Better colormap for the 2D interaction plot"

e7a700a

This reverts commit f0f8641. Actually viridis is already good enough on recent matplotlib versions and we want to continue supporting older matplotlib versions.

ogrisel mentioned this pull request May 10, 2019

FIX API links to experimental classes #13854

Merged

Various nitpicks

67b708d

ogrisel approved these changes May 23, 2019

View reviewed changes

NicolasHug added 3 commits May 25, 2019 09:08

Merge branch 'master' of github.com:scikit-learn/scikit-learn into fa…

0319e9d

…st_partial_dep_hist_gbdt

Merge branch 'fast_partial_dep_hist_gbdt' of github.com:NicolasHug/sc…

8a81d69

…ikit-learn into fast_partial_dep_hist_gbdt

Merge branch 'master' of github.com:scikit-learn/scikit-learn into fa…

2e7a79d

…st_partial_dep_hist_gbdt

NicolasHug added 2 commits July 4, 2019 18:03

Merge branch 'master' of github.com:scikit-learn/scikit-learn into fa…

6100123

…st_partial_dep_hist_gbdt

bigger plots?

fa5485a

minor change in words

fcd0fc2

tight layout?

bc30c60

glemaitre merged commit 0eaaeaf into scikit-learn:master Jul 11, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

EHN Fast PDPs for histogram-based GBDT (scikit-learn#13769)

490cb47

Uh oh!

[MRG] Fast PDPs for histogram-based GBDT #13769

[MRG] Fast PDPs for histogram-based GBDT #13769

Uh oh!

Conversation

NicolasHug commented May 2, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented May 2, 2019

Uh oh!

NicolasHug commented May 3, 2019

Uh oh!

ogrisel commented May 9, 2019

Uh oh!

NicolasHug commented May 9, 2019

Uh oh!

ogrisel commented May 9, 2019

Uh oh!

ogrisel commented May 9, 2019

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented May 9, 2019

Uh oh!

ogrisel commented May 9, 2019

Uh oh!

ogrisel commented May 10, 2019

Uh oh!

ogrisel commented May 10, 2019

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jul 3, 2019

Uh oh!

glemaitre commented Jul 3, 2019

Uh oh!

ogrisel commented Jul 3, 2019

Uh oh!

NicolasHug commented Jul 5, 2019

Uh oh!

amueller commented Jul 5, 2019

Uh oh!

NicolasHug commented Jul 5, 2019

Uh oh!

glemaitre commented Jul 11, 2019

Uh oh!

Uh oh!

NicolasHug commented May 2, 2019 •

edited

Loading