Skip to content

lnprob of DPGMM covariance_type='full' depends on number of items #7371

Closed
@rainwoodman

Description

@rainwoodman

Description

lnprob of DPGMM covariance_type='full' depends on number of items

Steps/Code to Reproduce

from sklearn import mixture
numpy.random.seed(1234)
X = numpy.random.normal(size=(2000, 1))
print(X.shape)
m = mixture.DPGMM(1, covariance_type="full")
m.fit(X)
print(m.__dict__)
b = linspace(-5, 5)
#print (m.score(b.reshape(-1, 1)))
plot(b, exp(m.score(b.reshape( -1, 1))).ravel())

X = numpy.random.normal(size=(3000, 1), )
m = mixture.DPGMM(1, covariance_type="full")

m.fit(X)
print(m.__dict__)
b = linspace(-5, 5)
#print (m.score(b.reshape(-1, 1)))
p = exp(m.score(b.reshape( -1, 1)))
print(p.max() - np.log(sqrt(2)))
plot(b, p.ravel())

Expected Results

Two lines shall overlap, as they are drawn from the same distribution

Actual Results

In 0.17.1 these two lines are not overlapping; one is significantly higher than the other, indicating a normalization error.

I understand there is massive overhaul of mixture.py in 0.18; so this is probably not very relevant. Please consider adding this as a test case on the full covariance model in the rewrite. Thanks.

Versions

Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-centos-6.8-Final
Python 3.5.1 |Anaconda custom (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.10.4
SciPy 0.17.1
Scikit-Learn 0.17.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions