WIP Adds Generalized Additive Models with Bagged Hist Gradient Boosting Trees #19914

thomasjpfan · 2021-04-17T21:56:54Z

This is very WIP!

There are many things left to be done here! Please do not review yet! Here is an example of the estimator in action!

There is still much to be done for this PR, but I want to see if this fits in with our inclusion criterion. Intelligible Models for Classification and Regression has 259 citations.

Reference Issues/PRs

Closes #3482

What does this implement/fix? Explain your changes.

This implementation builds on the parts of HistGradientBoosting* by restricting the splitter to only split one feature at a time.

Intelligible Models for Classification and Regression shows that Bagged Gradient Boosting Trees are better than the other GAMs (backfilling techniques or using splines). The tree based approach was shown to train faster for bigger datasets.

Using the histograms is based off of a follow up paper Accurate Intelligible Models with Pairwise Interactions. In this paper, they also describe a way to quickly obtain pairwise interactions. (This can be a follow up PR)

CC @amueller

thomasjpfan · 2021-04-27T03:32:19Z

Follow up from the meeting:

@lorentzenchr From reading interaction_constraints in Lightgbm's docs I think this PR is very similar to setting "no interaction with other features".

@GaelVaroquaux I see GAMs as "interpretable" in the same line as logistic regression's coefficients, which comes with the same pitfalls. The big difference between GAMs (tree based or spline based) is that GAMs uses a function to represent the conditional dependence.

From the meeting, I mentioned that I went with this gradient boosted version of GAM because the other versions do not scale. There are newer versions of GAM's that scale which is implemented in R's bam package. The paper the package is based on was cited 209 times.

I think moving forward, we can get the same type of behavior by adding interaction_constraints to HistGradientBoosting*. ~~This PR is basically HistGradientBoosting* with no feature interactions, very shallow trees, and bagging.~~ Edit: They are not exactly the same because constraining the interactions still results in one tree per iteration.

GaelVaroquaux · 2021-04-27T07:18:14Z

From the meeting, I mentioned that I went with this gradient boosted version of GAM because the other versions do not scale. There are newer versions of GAM's that scale which is implemented in R's bam package. The paper the package is based on was cited 209 times.

My view (echoing the FAQ on the inclusion criteria https://scikit-learn.org/stable/faq.html#what-are-the-inclusion-criteria-for-new-algorithms) is that an algorithm implementing a classic model has more lenient inclusion criteria. So, I would say that if the algorithm of bam solves the exact same problem as classic GAMs, it is definitely eligible for inclusion. Do you know a few details about the algorithm?

ogrisel · 2021-04-27T08:14:56Z

I think adding support for interaction constraints to our histogram gradient boosting tree implementation would be interesting. E.g. each tree would be allowed to consider only 1 feature at a time, or 2 or 3 and so on as an inductive bias to make the decision function easier to grasp: the partial dependence plots would more faithfully reflect the actual behavior of the decision function, even though they would still be problematic to interpret in the presence of correlated features (as it the case for linear models).

We could also consider allowing the users to pass specifications of groups of features that are allowed to interact with one-another but the API will be more complex to use.

ogrisel · 2021-04-27T08:26:17Z

I wonder if fitting interactions limited GBRT and followed by fitting a 1d cubic spline approximation of each feature-wise decision function followed by a refitting a linear model on the spline features would not get the best of both words: smooth models with a scalable fitting procedure. Probably not for scikit-learn, but I wonder if the GAM community has explored this 2-stage fitting strategy.

lorentzenchr · 2021-04-27T17:33:14Z

Just as cross-reference the issue for HGBT interaction constraints: #19148.

ogrisel · 2021-04-28T07:49:10Z

Just as cross-reference the issue for HGBT interaction constraints: #19148.

Thanks, I was looking for this PR in the github search but missed it for some reason (bad keywords). Edit: it's only an issue, I thought there was already a draft PR ;)

ogrisel · 2021-04-28T08:06:10Z

For reference, in a private exchange with Yannig Goude, he mentioned that the bam function of the mgcv R package is a quite scalable implementation of GAMs with splines. It also has an option to discretize continuous features to implement the strategy of:

Generalized Additive Models for Gigadata: Modeling the U.K. Black Smoke Network Daily Data
Simon N. Wood, Zheyuan Li, Gavin Shaddick & Nicole H. Augustin, 2017
https://www.tandfonline.com/doi/full/10.1080/01621459.2016.1195744

The main paper for the original implementation of the non-discretized bam function is:

Generalized additive models for large data sets
Simon N. Wood, Yannig Goude. Simon Shaw, 2015
https://rss.onlinelibrary.wiley.com/doi/full/10.1111/rssc.12068

So I think this would be a good reference implementation to compare to if we ever decide to implement spline-based GAMs in scikit-learn.

bmreiniger · 2021-04-28T15:20:23Z

Somewhat related is the "GA^2M" of Caruana et al and the subsequent "Explainable Boosting Machine" in Microsoft's InterpretML, see Explainable Boosting in https://github.com/interpretml/interpret#citations. GBMs train on two-feature combinations (and single features), and those become the basis for the GAM. (And there's a ton of weirder stuff in that package's implementation, IIRC bagging the overall GAM, including an iterative fitting of the bases, using RFs as the gradient boosting base model, etc.)

amueller · 2021-04-30T23:39:47Z

@bmreiniger Rich renamed the model from GA^2M to Explainable Boosting Machine, this is all the same work, I'd say.
One of the newer iteration is described in Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission I think?

Also, I don't get the issue with the constant in the example, shouldn't the mean predictor be the first step?

amueller · 2021-04-30T23:48:15Z

This is sooo awesome!! I'd love to have it. I pinged Harsha who works on the interpret-ml code, I hope he can have a look.
I haven't reviewed it yet but the results seem good :)

Harsha-Nori · 2021-05-04T14:23:35Z

Hi everyone,

I work on interpret, which has the most recent implementation of the boosted tree based GAMs referenced in those papers -- as Andreas mentioned, we just call them Explainable Boosting Machines now in this codebase.

Rich and everyone else on our side were excited to see this PR, and we thought we might be able to help clarify some details or contribute to the discussion. For example, many of the algorithmic choices were made to make models identifiable or more explainable (e.g. enhancing smoothness of learned graphs or contributing uncertainty intervals to each shape function).

We'd all be happy to talk through these details on a call sometime if @thomasjpfan or anyone else is interested!

thomasjpfan · 2021-10-06T19:13:30Z

I think the inclusion of spline-based GAMs clearly meets our inclusion criterion. Using bam as a reference as suggested by @ogrisel in #19914 (comment) will be the starting point for that.

There is still a question of inclusion for tree-based GAMs. The Intelligible Models for HealthCare: Predicting Pneumonia Risk and Hospital 30-day Readmission published in 2017 has 1k citations according to google scholar. Is this enough for inclusion? Would using benchmarks to show that tree-based GAMs perform better than spline-based GAMs be enough for inclusion?

Moving forward, I plan to work on spline-based GAMs. I think it will be strange to include the tree-based ones without the spline-based ones.

GaelVaroquaux · 2021-10-06T19:26:31Z

Moving forward, I plan to work on spline-based GAMs. I think it will be strange to include the tree-based ones without the spline-based ones.

I think that this is a great resolution. For the question above, I cannot answer. Such question is always challenging. Thanks Thomas!!

amueller · 2021-10-15T18:44:40Z

I generally agree with @thomasjpfan, but I also think it's strange to include #21020 without including tree-based GAMs first, but maybe we can have them all in the same release lol?

Deciding to include #21020 but deciding tree-based GAMs are out of scope makes no sense to me, since #21020 is a generalization of the published and relatively well-understood tree-based GAMs of Rich.

lorentzenchr · 2024-04-09T20:51:38Z

I dare to close. It's inactive for (rounded) 3 years.

WIP Adds GAMBoostingRegressor

4e60846

github-actions bot added cython module:ensemble labels Apr 17, 2021

thomasjpfan added 14 commits April 17, 2021 18:08

WIP Use an effective learning rate

615e099

WIP Fixes tests

6cc7180

WIP Use baseline predictions

0488589

WIP Adds GAMBoostingClassifer

21d37f3

WIP Actually use tree_idx

a4e6774

CLN DRY

202dca3

EXA Plotting without sorting

5931c41

CLN Adds docstring and tests

a12eaa7

DOC Adds user guide

a74dbb9

DOC Adds example to user guide

966c643

REL Remove unneeded file

c0b4f71

DOC Adds whats new

11c6946

Merge remote-tracking branch 'upstream/main' into gam_boosted_trees

cf40a11

EXA Uses tight_layout instead

08223e4

amueller mentioned this pull request Sep 30, 2021

ENH FEA add interaction constraints to HGBT #21020

Merged

7 tasks

lorentzenchr mentioned this pull request Oct 8, 2021

Generalized additive models (GAMs)? #3482

Open

lorentzenchr closed this Apr 9, 2024

Uh oh!

WIP Adds Generalized Additive Models with Bagged Hist Gradient Boosting Trees #19914

WIP Adds Generalized Additive Models with Bagged Hist Gradient Boosting Trees #19914

Uh oh!

Conversation

thomasjpfan commented Apr 17, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

This is very WIP!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Uh oh!

thomasjpfan commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GaelVaroquaux commented Apr 27, 2021 via email

Uh oh!

ogrisel commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lorentzenchr commented Apr 27, 2021

Uh oh!

ogrisel commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel commented Apr 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bmreiniger commented Apr 28, 2021

Uh oh!

amueller commented Apr 30, 2021

Uh oh!

amueller commented Apr 30, 2021

Uh oh!

Harsha-Nori commented May 4, 2021

Uh oh!

thomasjpfan commented Oct 6, 2021

Uh oh!

GaelVaroquaux commented Oct 6, 2021 via email

Uh oh!

amueller commented Oct 15, 2021

Uh oh!

lorentzenchr commented Apr 9, 2024

Uh oh!

Uh oh!

thomasjpfan commented Apr 17, 2021 •

edited

Loading

thomasjpfan commented Apr 27, 2021 •

edited

Loading

ogrisel commented Apr 27, 2021 •

edited

Loading

ogrisel commented Apr 27, 2021 •

edited

Loading

ogrisel commented Apr 28, 2021 •

edited

Loading

ogrisel commented Apr 28, 2021 •

edited

Loading