incorrect estimated means lead to non positive definite covariance in GMM #4429

xuewei4d · 2015-03-20T19:35:34Z

I dig into #3945 and #2640. PR #3945 claims

Due to round off errors, the formula used to compute the weighted covariance matrix may lead to matrices with large negative eigenvalues.

Actually, the reason why the sample code given by @AlexisMignon in #2640 raises an exception is that the simplified formula C = (\sum_i w_i x_i x_i') - \mu \mu' does not apply when \mu is not the mean of X.

The gmm created in this line

gmm = mixt.GMM(2, params="wc", covariance_type="full", min_covar=1e-3)

does not estimate mean in M-step. The gmm.means_ is initialized by k-means, and does not change in EM. When the code in m_step before #3945 merged is executed, in one of component the mean is initialized to [[ 3.42118979 3.53824541]] by kmeans. However, the real mean is [0.02670257 0.02945094] before the exception is raised. Using C = (\sum_i w_i x_i x_i') - \mu \mu', of course, the estimated covariance is not positive definite. If

gmm = mixt.GMM(2, params="wmc", covariance_type="full", min_covar=1e-3)

We don't have the exception at all.

In #2640, without the snippet resulting the error, I also cannot figure out why the covariance is not positive definite. Anyway, I think the method log_multivariate_normal_density in gmm.py could be called by methods in other modules, so it is better to check the covariance first before Cholesky decomposition.

Besides, when using parameter params='wc' to initialize a gmm, we ~~always~~ have non-positive-definite covariance problems, with full and tied type covariance. You could try it with any random generated data. I think it is strange to build GMM without estimating means. It would be better to deprecate params initialization other than wmc.

Finally, the original code using equation C = (\sum_i w_i x_i x_i') - \mu \mu' before #3945 merged, I think, it is OK as long as we could guarantee \mu is the mean of X. But I found the code in Matlab use the way as #3945 does. I cannot see other reasons to use that way except preventing round-off errors, which I really doubt.

Ping @amueller

The text was updated successfully, but these errors were encountered:

amueller · 2015-03-20T19:50:04Z

Thanks for digging into this. You are right, not estimating the means is a problem.
I also think it is strange, and we should deprecate the usage.

So you say matlab uses the same way we currently do it (after the PR)? I looked at vlfeat and they also do the same. I guess it is for numerical reasons. I don't see a strong reason to change it back. Do you?

I don't understand your statement "In #2640, without the snippet resulting the error, I also cannot figure out why the covariance is not positive definite. "
Why would you check the covariance before the cholesky? You could also only raise an error, right?

xuewei4d · 2015-03-20T20:04:56Z

Besides the deprecation, I don't know why there is init_params and params, which both accept 'wmc' initialization format. It seems redundant.

Yes, I agree with you. I don't think we need to change it back in _covar_mstep_full, but do you think we need to change _covar_mstep_tied? Matlab does as we do now after the PR for full covariance type.

Sorry I don't make that clean. I just don't know why @arendu has non positive definite covariance error in #2640 . I think it would be better to catch numpy.linalg.linalg.LinAlgError: in _log_multivariate_normal_density_full.

amueller · 2015-03-20T20:11:50Z

I see. I agree it would be nice to catch the error. I thought it was more informative.
I think the intention behind the "duplication" between init_params and params is that you might want to a) give a custom initialization, but want to keep on training (init_params allows that), or want to set some fixed parameters and not train them at all (params allows that).
I think we want to deprecate params because estimating some parameters and not others can lead to trouble.

For init_params it is not something we usually support in scikit-learn, but I can see how it could be useful.
A different way to do this would be to actually pass the initialization as parameters, which is what we do in different places, like means_init=some_array, covars_init=some_array etc.

xuewei4d · 2015-03-20T20:33:12Z

Yeah. We should provide an explicit way to let users initialize the means, covers, weights with their own initialization. And do you think we need to check users initialize them properly? Such as positive definite covariance, the sum of component weights should be 1.

I will work on the params deprecation and catch numpy.linalg.linalg.LinAlgError:.

amueller · 2015-03-20T20:37:56Z

Yes, we should check that. I'm +0.5 on changing the way users can initialize. The current way is different from the standard scikit-learn way, but deprecating things is always painful for users.

amueller · 2015-03-20T20:38:12Z

Thanks for working on this :)

alexis-mignon · 2015-03-20T21:05:42Z

Hello,

Thanks @xuewei4d for the correct explanation of the reason why the covariance matrix was not positive definite.

However, I ran into the problem because I was in a situation where I did want to keep the mean unchanged after initilialization. Basically the idea was to keep the kmeans initialization because it tiles the data cloud while the plain GMM solutions was tending to have confused centers and different covariances to catch non Gaussian tails. I do not claim it is a recommanded way to use GMMs, however, it did the job pretty well.

I think it is convenient to keep 'init_params' and 'params' separate and assume that users know what they do when changing those values. At most, I would add a note in the class documentation to warn that it is an unrecommanded trick.

xuewei4d · 2015-03-22T23:25:19Z

@alexis-mignon I understand, so let users to worry about how to use init_params and params correctly.

@amueller I will add to my GSoC proposal how to design the interface for passing initialization of mean, covariance, weights in GMM.

amueller · 2015-03-23T14:46:39Z

thanks :)

@alexis-mignon I think it is not a good idea to keep params, because it is a very special use-case, and we need to make sure that all seven possible combinations work correctly. They did (do?) not, and it looks to me like this is quite a hassle for a very specific use-case.
Also, I'm not even sure what the settings mean. What is the mean without adjusting the prior on the mixture components? Do you just compute the responsibilities and use those to update the mean, but then discard the responsibilities and not update the prior?

alexis-mignon · 2015-03-24T10:12:06Z

I see some interest in being able to keep some parameter fixed:

You may know the mean values from a phyical model but need to estimate the process noise (using a tied model for instance)
You may now your noise model, but need to estimate the mean locations (equivalent to deblurring but with discrete samples)

However, I am not sure the corresponding estimators in those settings are exactly the same as the one used in the present EM algorithm. The existence of this paper, , suggests that there might be some specific treatments when some parameters are known. However, I don't have access to science direct and could not read it.

amueller · 2015-03-24T16:10:52Z

Hurray for closed science!
While I agree that there is an application for that, I would claim it is outside the normal sklearn scope. Sklearn is pretty black-boxy and not really good for plugging in known quantities into models.
I think you'd be better served with a probabilistic modelling framework for that.

alexis-mignon · 2015-03-24T16:40:46Z

What framework would you suggest ?

amueller · 2015-03-24T17:53:05Z

I'm not sure what the best in Python would be. I'd think of something like factory. I think there is some variational inference machines in python, too. Maybe https://github.com/bayespy/bayespy which also has a list of related packages?

xuewei4d · 2015-03-26T19:12:35Z

@amueller Besides param problem, I am curious about four covariance type spherical, tied,full,diag, since I never saw that before, at most the case where the covariance is fully estimated, but shared by each component. I found a comment in the gmm code points to a Murphy's 1998 paper, Fitting a Conditional Linear Gaussian Distribution (8 citations). It elaborates all the updating functions for four covariance type. However I didn't find similar material in his MLAPP book. I checked out Matlab implement. There are only two types diag and full, both of them could be shared (tied) or not shared. I think spherical is rarely used. I am just asking is that necessary to refactor covariance updating to that as Matlab does?

amueller · 2015-03-26T19:15:42Z

spherical is the simplest one, right? I don't see a reason to remove it.
Having tied diagonal might be nice, but no-one ever asked for it, so maybe not so important?

amueller · 2016-10-27T21:17:49Z

@tguillemot this is fixed in the new GMM, right?

tguillemot · 2017-01-04T13:16:39Z

Sorry I haven't seen this issue.
This is fixed by #6666.

thomasjpfan · 2022-07-21T16:43:16Z

Closing this PR based on #4429 (comment). Also the issue is about the original GMM class which is no longer in scikit-learn.

xuewei4d changed the title ~~incorrect estimated means lead to non positive definite covariance~~ incorrect estimated means lead to non positive definite covariance in GMM Mar 20, 2015

xuewei4d mentioned this issue Mar 22, 2015

[MRG + 2] Friendly error when lvmpdf has non positive definite covariance #4434

Merged

rebeccaroisin mentioned this issue May 13, 2015

EM algorithm in GMM fails for one-dimensional datasets using 0.16.1 (but fine with 0.15.2) #4717

Closed

xuewei4d mentioned this issue Jun 5, 2015

[RFC] GSoC2015 Improve GMM API #4802

Closed

16 tasks

cmarmo added the module:mixture label Nov 30, 2021

thomasjpfan added this to Quansight's scikit-learn Project Board Apr 24, 2022

thomasjpfan moved this to Todo📬 in Quansight's scikit-learn Project Board Apr 24, 2022

thomasjpfan closed this as completed Jul 21, 2022

Repository owner moved this from Todo📬 to Done🚀 in Quansight's scikit-learn Project Board Jul 21, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

incorrect estimated means lead to non positive definite covariance in GMM #4429

incorrect estimated means lead to non positive definite covariance in GMM #4429

xuewei4d commented Mar 20, 2015

amueller commented Mar 20, 2015

xuewei4d commented Mar 20, 2015

amueller commented Mar 20, 2015

xuewei4d commented Mar 20, 2015

amueller commented Mar 20, 2015

amueller commented Mar 20, 2015

alexis-mignon commented Mar 20, 2015

xuewei4d commented Mar 22, 2015

amueller commented Mar 23, 2015

alexis-mignon commented Mar 24, 2015

amueller commented Mar 24, 2015

alexis-mignon commented Mar 24, 2015

amueller commented Mar 24, 2015

xuewei4d commented Mar 26, 2015

amueller commented Mar 26, 2015

amueller commented Oct 27, 2016

tguillemot commented Jan 4, 2017

thomasjpfan commented Jul 21, 2022

incorrect estimated means lead to non positive definite covariance in GMM #4429

incorrect estimated means lead to non positive definite covariance in GMM #4429

Comments

xuewei4d commented Mar 20, 2015

amueller commented Mar 20, 2015

xuewei4d commented Mar 20, 2015

amueller commented Mar 20, 2015

xuewei4d commented Mar 20, 2015

amueller commented Mar 20, 2015

amueller commented Mar 20, 2015

alexis-mignon commented Mar 20, 2015

xuewei4d commented Mar 22, 2015

amueller commented Mar 23, 2015

alexis-mignon commented Mar 24, 2015

amueller commented Mar 24, 2015

alexis-mignon commented Mar 24, 2015

amueller commented Mar 24, 2015

xuewei4d commented Mar 26, 2015

amueller commented Mar 26, 2015

amueller commented Oct 27, 2016

tguillemot commented Jan 4, 2017

thomasjpfan commented Jul 21, 2022