-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Generalized additive models (GAMs)? #3482
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There is pyearth: https://github.com/jcrudy/py-earth |
Is there anything else related to GAMs that could potentially be added? |
Not sure which of these are actually commonly used and scale to larger datasets.... Have you checked? |
the original GAM paper by Hastie has around 1300 citations and seems to be of much interest to people and the SpAM had around 200 odd... ( not sure if citation is the only important criteria here for inclusion based on your recent addition to the FAQ )... I don't think LISO is that popular ( I may be mistaken )... And for SpAM, there is a paper from Yahoo Labs - AAAI 15 by Wei Sun. From the application described in that paper I think it may scale well to larger datasets... |
@amueller I am interested in working on SpAM... Given the above references do you feel it will be of considerable interest to the devs? If so I'll finish off my other PRs and start with SpAM? |
@fabianp and @eickenberg have an implementation of SpAM somewhere. Can you |
Yup, it's been a while since I've looked at this, and it is only a proof of concept, we provide """as is""" :) https://gist.github.com/eickenberg/fe7010b63a4196f849fa |
Thanks for sharing :) @agramfort can I proceed with this at hand? I am thinking of making a directory for all GAM based models... so we can have something like : |
I guess. Just keep it small so you maximize your chances to get feedback |
gam is a horrible acronyme. It's impossible to google or to guess what |
how about simply |
I like it. Just like 'linear_models' are really |
I think linear_models is already a bit long, and should probably have been On 3 February 2015 at 08:46, Gael Varoquaux notifications@github.com
|
I saw the inclusion of GAMs in GSoC 2015. I just want to know whether the project is assigned to anyone or else I would start developing the GAM models. Please reply |
to me there is still a need for a good and standard GAM implementation in
python like the gam package in R.
|
@agramfort and @sreenivasraghavan71 there is a pr for statsmodels GAM here: statsmodels/statsmodels#2435 They are sorely in need of more code reviewers and maintainers! Help would probably be appreciated. |
Curious about this myself. Noticed this wiki: https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2015 Is this currently in flight? |
Noticed this wiki: https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2015 Is this currently in flight?
Scikit-learn is not participating in the GSoC this year, because all the
core contributors are already too committed to mentor.
|
Does anyone know of an efficient workaround using GAM in Python ? |
Around what do you want to work? Maybe it is better to ask this on the mailing list with reference to this On Thursday, September 29, 2016, Sarthak Munshi notifications@github.com
|
@saru95 there's pyearth, if that helps. |
@saru95, @amueller I've written a python implementation of GAMs using penalized B splines inspired heavily by Simon Wood and his mgcv C-RAN package. check it out here https://github.com/dswah/pyGAM I've included several model families, as well as link functions and distributions that you can mix and match to make very custom GAMs. |
I completely agree. Any news on incorporating GAMs into the sklearn arsenal? |
How to use SAM package to select feature? |
Any updates on adding GAMs to sklearn ? |
I'd like this feature a lot as well; seems like including additive models can make sklearn more of a "one stop shop." |
we can't promise to be a one stop shop with one finite set of maintainers.
I definitely think these things should exist in the python ecosystem
whether or not we are able to see them merged in any near future
|
@dswah you could consider migrating your project to scikit-learn-contrib https://github.com/scikit-learn-contrib to get more visibility and expand the developper base. note that you used the GPL license so this code cannot enter the sklearn code base. i'd also really love to have a state-of-the-art GAM package in Python. |
There are some more general glms in #9405 too |
There's been a lot of interest in GAMs recently because of interpretability. The EBM paper has 205 citations which is not that great but narrowly passes our threshold. An application paper has over 600 citations though: There's been a more recent evaluation here: disclaimer: this work partially comes out of MS. I had put @thomasjpfan on GAMs previously, though we had looked more at the classical spline stuff. The gradient boosting variant might be more straightforward to implement given what we already have (they are using binning). |
@NicolasHug might also be interested in so far as this is a small extension of HistGradientBoosting ;) |
On that there was also the #17027 proposal more recently. |
Been a while that I haven't seen this thread pop up :)
I'll be cheering on whoever takes this on! It would be great to have such
models in sklearn.
https://www.linguee.com/german-english/translation/was+lange+w%C3%A4hrt+wird+endlich+gut.html
…On Fri, Jul 17, 2020 at 2:57 PM Roman Yurchak ***@***.***> wrote:
we had looked more at the classical spline stuff.
On that there was also the #17027
<#17027> proposal more
recently.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#3482 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAJ7AC4TTLSOUZFTEUXZURDR4DCONANCNFSM4ASCKK2A>
.
|
@eickenberg I assume you mean splines, not additive gradient boosting? |
Indeed, but the gradient boosting variant looks interesting and not that hard to add. |
(Looks like it's a hot topic again: https://arxiv.org/abs/2004.13912 @eugenium pointed me to this) |
here is code I wrote many years ago when I looked at GAMs.
It implements the so-called 'backfitting' approach which resembles
a lot to coordinate descent. You improve the loss one feature at a time.
https://gist.github.com/agramfort/c1d0307f4545a54642f011f79d9966b8
hope this helps. I agree that a good GAM implementation would be very
valuable. My code is a toy.
… |
Here are mine: https://gist.github.com/eickenberg/fe7010b63a4196f849fa This one does a round-robin coordinate descent |
Hot topic and wanted, indeed!
Hinton also published on GAM extensions to DNNs this year:
https://arxiv.org/abs/2004.13912
Danilo
|
Who would be interested in reviewing a PR for GAMs? :) |
I think that a good GAM implementation would be great.
However, we need to spend time in having a good solver.
I worry that Iterative Reweighted Least Square solvers will be very slow.
Coordinate descent will probably work well, unless n is very large.
A first work, before a pull request, would be to prototype and compare
solvers (trying to reuse as much as possible existing implementation, as
those listed in the thread above).
|
@GaelVaroquaux did you see the comment about the gradient boosting version? |
@GaelVaroquaux did you see the comment about the gradient boosting version?
Yes, and I like it :)
Still, it would be nice to do a bit of benchmarking based on the various
bits of code that have been suggested above.
|
I would but I want to review the categorical features first :p |
I'm also in favor of GAMs, though I see them more in the context of GLMs: "GLM + spline + penalty = GAM". The tricky part, in my opinion, is not solvers but the handling of the features, i.e. the API. In R' mgcv you can specify spline type and penalty for each feature by name . |
Currently, there are 2 proposals on the table:
I wonder how great a combination of the two would be: Being able to specify for which feature to use smooth splines and for which to use trees? (and which to treat as categorical?) |
Re 2, I would rather have #19914 first, in particular because it does cyclic boosting and is build on well-cited work. I think doing #21020 is also nice, but I think it's less well understood in terms of interpretability. Re API: |
For tree based GAMs, the point is that interaction constraints give us the possibility to specify feature-wise additivity in link-space (or allow for pairwise interactions…). That‘s the big step for interpretability. |
@lorentzenchr from an API perspective having general interaction constraints seems more complex than what's standard in GAMs and I guess it's a question whether we want separate classes or not. It seemed a bit weird in terms of inclusion criteria to me (what's the reference for general interactions constraints?). I guess we're weighing what the implementations do more heavily than what the literature does now, which is an option but not something we have decided on. In the end, as long as we have easy ways for users to discover how to do gradient boosting GAMs that are in accordance with what's empirically & academically validated then that's great. |
If we have all pieces together, I'm not opposed to the idea of a new class that sets some default parameters such that boosted tree-based GAMs become easier available. |
The best R package is 'mgcv' apparently these days. Look at here for a nice introduction to Generalized Additive Models and some incentive: Seems quite yummy as a regression model (e.g. non-linear but still explainable). |
Hello there,
Thanks for making this fantastic library. I use it every day in my bioinformatics research. We're developing a toolkit for single-cell RNA-seq analysis (http://github.com/yeolab/flotilla) and want to add all current state-of-the-art analyses. Unfortunately, most of these are in R. I can reimplemement some of them, but they rely on certain R packages, in particular VGAM, aka Vector Generalized Linear and Additive Models. I've found a few mentions of GAMs here:
Has there been any update on creating these libraries?
The text was updated successfully, but these errors were encountered: