Closed
Description
Description
lnprob of DPGMM covariance_type='full' depends on number of items
Steps/Code to Reproduce
from sklearn import mixture
numpy.random.seed(1234)
X = numpy.random.normal(size=(2000, 1))
print(X.shape)
m = mixture.DPGMM(1, covariance_type="full")
m.fit(X)
print(m.__dict__)
b = linspace(-5, 5)
#print (m.score(b.reshape(-1, 1)))
plot(b, exp(m.score(b.reshape( -1, 1))).ravel())
X = numpy.random.normal(size=(3000, 1), )
m = mixture.DPGMM(1, covariance_type="full")
m.fit(X)
print(m.__dict__)
b = linspace(-5, 5)
#print (m.score(b.reshape(-1, 1)))
p = exp(m.score(b.reshape( -1, 1)))
print(p.max() - np.log(sqrt(2)))
plot(b, p.ravel())
Expected Results
Two lines shall overlap, as they are drawn from the same distribution
Actual Results
In 0.17.1 these two lines are not overlapping; one is significantly higher than the other, indicating a normalization error.
I understand there is massive overhaul of mixture.py in 0.18; so this is probably not very relevant. Please consider adding this as a test case on the full covariance model in the rewrite. Thanks.
Versions
Linux-2.6.32-431.17.1.el6.x86_64-x86_64-with-centos-6.8-Final
Python 3.5.1 |Anaconda custom (64-bit)| (default, Dec 7 2015, 11:16:01)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]
NumPy 1.10.4
SciPy 0.17.1
Scikit-Learn 0.17.1