-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
What about Gaussian Mixture Regression? #6073
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
never heard of this. Do you have a ref?
|
@agramfort It's tough to find isolated reference, but Section II.A from Toda 2007 gives a good review. It's quite popular for spectral mapping. The idea is to model the joint pdf p(x, y) with GMM and then predict y as an expectation of a conditional distribution p(y | x), i.e. \hat{y} = E[p(y | x)]. In essence, this is a smooth piecewise-linear transformation (approximately linear around cluster centroids). Here's an example: |
interesting trick. I am not sure it satisfies the constraints for adding a
new method to sklearn but If I were you I would make my code a gist on
gist.github.com and point to it here. At least for now.
|
@agramfort alright then. Once it's ready, it can also be discussed through the mailing list |
Closing for now.Seems kinda related to GPs? But without a solid reference and usage example probably a "no". feel free to post to scikit-learn-contrib. |
I realize this is closed, but there is reference to this in section 11.2.4 in Murphy (2012) Machine Learning: A Probabilistic Perspective. As a usage example, see Imai and Tingley (2011) in the American Journal of Political Science. GMM regressions are fairly commonly used in the social sciences, and are well implemented in R (package |
@stnorton I don't think we want to implement everything mentioned in Murphy's book. Is there a reason this is commonly used in social sciences? |
@amueller It's difficult enough to discourage sklearn's use for mixture regressions, especially given that it's so easy in R. There are also no examples of implementation available online (a search for an example for a seminar I'm teaching led me to this thread), leading to the need to reinvent the wheel each time. Mixture regressions are often used in social science for running models with heterogeneous treatment effects where the source of the heterogeneity isn't observed. Obviously, it's not necessary to cater to all research communities, but as Python becomes more popular for social scientists, this would be a nice feature for that research community to have. |
I agree it would be good for the research community to have. I think we'd be happy to add an example, or you could add a package to scikit-learn-contrib. It guess it's something like probs = GaussianMixture().fit(X).predict_proba(X)
expanded = (X * probs.reshape(X.shape[0], 1, -1)).reshape(X.shape[0], -1)
lr = LinearRegression().fit(expanded, y) ? |
This is not identical to, but seems related to, the example in
#10852. I'd be very happy
to see an example of this if it is a common technique.
|
A little bit late, but I made this library about 5 years ago because GMR would not fit perfectly to sklearn's estimator interface: https://github.com/AlexanderFabisch/gmr I could move it to scikit-learn-contrib. I don't know if it covers all use cases of GMR though. It was sufficient for my application. |
It's strange that scikit learn doesn't have a GMM based regression model. Is there a reason for this? If that's simply because no one had time/interest to do it, I can put up a PR.
The text was updated successfully, but these errors were encountered: