Scaling kills DPGMM [was: mixture.DPGMM not fitting to data] #2454

caofan · 2013-09-18T07:24:24Z

I am trying out the Gaussian mixture models in the package. I tried to model a mixture with two Components, G(1000,500^2), and G(2000,600^2). The following is the code:

data = np.random.normal(1000,500,1000)
data2 = np.random.normal(2000,600,1000)
data = list(data) + list(data2)
model = mixture.DPGMM(n_components=10,alpha=10,n_iter=10000)
model.fit(data)
print model.means_

And I got the following means of the components.
[[ 0.13436485]
[ 0.13199086]
[ 0.11750537]
[ 0.10560644]
[ 0.12162311]
[ 0.00204134]
[ 0.12058521]
[ 0.11997703]
[ 0.11944384]
[ 0.11890694]]

It seems the model does not fit properly to the data. Is it a bug or I have got something wrong in the application of the model?

Thanks.
Fan

The text was updated successfully, but these errors were encountered:

amueller · 2015-01-28T19:01:25Z

This looks pretty bad :-/

amueller · 2015-01-28T23:03:49Z

My explanation for this is: the model assumes a N(0, 1) prior on the means [and also a fixed prior on the covariance], which is not reasonable for your data. To make this work, the data should be scaled to have zero mean and unit variance. Then the result would be much more sensible.

I have to little experience in these kind of models to say what a good solution would be.
Possible candidates:

prescale the data (and adjust precision and mean that are estimated accordingly)
raise a warning?
use a hierarchical Baysian approach?
make the priors parameters of the estimator
estimate the priors from the data (which is probably the same as just rescaling the data)
use a much wider (or non-informative) prior on the means

Ps: any Baysian should feel free to hit me and implement the hierarchical approach.

amueller · 2015-01-28T23:48:58Z

Thinking about it, I'm not sure if 1000 samples shouldn't be enough to overcome the prior... hum...

amueller · 2015-01-29T00:54:24Z

The derivation of the mean http://scikit-learn.org/dev/modules/dp-derivation.html#the-updates is quite different from the one listed in Bishop's or Murphy's book. In particular, in the books the variational mean parameters don't depend on the variational precision parameters, which they do in the derivation in the docs (which is odd).
I'm a bit tempted to replace the implementation by a close correspondence to Bishop and see how that goes.

GaelVaroquaux · 2015-01-29T10:34:56Z

I'm a bit tempted to replace the implementation by a close correspondence to
Bishop and see how that goes.

I am not very attached to our implementation. It has given us a lot of
problems in the past.

ogrisel · 2016-09-10T19:37:36Z

Closing: the new Dirichlet process GMM re-write has been merged in master. It is not affected by this bug.

arjoly added the Bug label May 11, 2014

amueller added this to the 0.15.1 milestone Jul 18, 2014

amueller mentioned this issue Jan 28, 2015

[MRG] Fix gamma update in DPGMM #4175

Closed

amueller changed the title ~~mixture.DPGMM not fitting to data~~ Scaling kills DPGMM [was: mixture.DPGMM not fitting to data] Jan 28, 2015

amueller mentioned this issue Feb 20, 2015

Density doesn't normalise in VBGMM and DPGMM #4267

Closed

amueller modified the milestones: 0.16, 0.17 Sep 11, 2015

amueller modified the milestones: 0.18, 0.17 Sep 20, 2015

tguillemot mentioned this issue Sep 2, 2016

[MRG+2] GSoC Final : Dirichlet Gaussian Mixture #7295

Closed

ogrisel closed this as completed Sep 10, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scaling kills DPGMM [was: mixture.DPGMM not fitting to data] #2454

Scaling kills DPGMM [was: mixture.DPGMM not fitting to data] #2454

caofan commented Sep 18, 2013

amueller commented Jan 28, 2015

amueller commented Jan 28, 2015

amueller commented Jan 28, 2015

amueller commented Jan 29, 2015

GaelVaroquaux commented Jan 29, 2015

ogrisel commented Sep 10, 2016

Scaling kills DPGMM [was: mixture.DPGMM not fitting to data] #2454

Scaling kills DPGMM [was: mixture.DPGMM not fitting to data] #2454

Comments

caofan commented Sep 18, 2013

amueller commented Jan 28, 2015

amueller commented Jan 28, 2015

amueller commented Jan 28, 2015

amueller commented Jan 29, 2015

GaelVaroquaux commented Jan 29, 2015

ogrisel commented Sep 10, 2016