EllipticEnvelope and GraphicalLasso: inconsistent results under different setups #12127

adrinjalali · 2018-09-21T08:01:18Z

While adding examples to docstrings, two models (three classes) show an odd
behavior, i.e. they give different results under different circumstances and they
are not random, since the random seed and rand_state are fixed.
The results are deterministic under each setting, but change from setup to
setup.

For instance (observed in PR #12124), the EllipticEnvelope, has the following
issue (this is a failiur on travis).

073     >>> import numpy as np
074     >>> from sklearn.covariance import EllipticEnvelope
075     >>> real_cov = np.array([[.8, .3],
076     ...                      [.3, .4]])
077     >>> np.random.seed(0)
078     >>> X = np.random.multivariate_normal(mean=[0, 0],
079     ...                                   cov=real_cov,
080     ...                                   size=300)
081     >>> cov = EllipticEnvelope(random_state=0).fit(X)
082     >>> cov.covariance_ # doctest: +ELLIPSIS
Expected:
    array([[0.7411..., 0.2535...],
           [0.2535..., 0.3053...]])
Got:
    array([[0.81478325, 0.28653659],
           [0.28653659, 0.30913504]])

The same issue was observed in PR #11732, for GraphicalLasso and
GraphicalLassoCV.
Please note that the results are deterministic, i.e. changing the values to
what's reported by travis, would make the test pass, as I've done for the
PR #11732 .

The corresponding code resulting in the issue, is the following:

import numpy as np
from scipy import linalg
from sklearn.datasets import make_sparse_spd_matrix
from sklearn.covariance import GraphicalLasso, log_likelihood
n_samples = 60
n_features = 20
prng = np.random.RandomState(1)
prec = make_sparse_spd_matrix(n_features, alpha=.98,
                              smallest_coef=.4,
                              largest_coef=.7,
                              random_state=prng)
cov = linalg.inv(prec)
d = np.sqrt(np.diag(cov))
cov /= d
cov /= d[:, np.newaxis]
X = prng.multivariate_normal(np.zeros(n_features), cov, size=n_samples)
emp_cov = np.dot(X.T, X) / n_samples
model = GraphicalLasso()
loglik_est = -model.fit(X).score(X)
loglik_real = -log_likelihood(emp_cov, prec)
print("estimated negative log likelihood: %g" % loglik_est)
[here the difference between systems is: 26.1847 vs 26.1927]
print("real negative log likelihood: %g" % loglik_real)
[here the difference between systems is: 28.1526 vs 28.1067]

The text was updated successfully, but these errors were encountered:

albertcthomas · 2018-09-21T11:58:44Z

For the EllipticEnveloppe failure I don't know if that helps but

import numpy as np
from sklearn.covariance import EllipticEnvelope
from sklearn.covariance import MinCovDet
from sklearn.utils.testing import assert_array_equal
np.random.seed(0)
real_cov = np.array([[.8, .3], [.3, .4]])
X = np.random.multivariate_normal(mean=[0, 0], cov=real_cov, size=500)
mcd = MinCovDet(random_state=0).fit(X)
env = EllipticEnvelope(random_state=0).fit(X)
assert_array_equal(mcd.covariance_, env.covariance_)
print(mcd.covariance_)

returns

[[0.74118335 0.25357049]
 [0.25357049 0.30531502]]

albertcthomas · 2018-09-21T12:30:16Z

You put size=300 in the doctest of EllipticEnveloppe but 500 in the doctest of MinCovDet

albertcthomas · 2018-09-24T09:54:40Z

Maybe I misunderstood the issue for EllipticEnveloppe but in

Expected:
    array([[0.7411..., 0.2535...],
           [0.2535..., 0.3053...]])
Got:
    array([[0.81478325, 0.28653659],
           [0.28653659, 0.30913504]])

the expected result is the one you obtained with n_samples=500 whereas you put n_samples=300.

I think that if you put n_samples=500 in the doctest:

073     >>> import numpy as np
074     >>> from sklearn.covariance import EllipticEnvelope
075     >>> real_cov = np.array([[.8, .3],
076     ...                      [.3, .4]])
077     >>> np.random.seed(0)
078     >>> X = np.random.multivariate_normal(mean=[0, 0],
079     ...                                   cov=real_cov,
080     ...                                   size=300)                   # put 500 here instead of 300
081     >>> cov = EllipticEnvelope(random_state=0).fit(X)
082     >>> cov.covariance_ # doctest: +ELLIPSIS

the test will pass.

albertcthomas · 2018-09-24T09:57:29Z

And the result should be the same for MinCovDet and EllipticEnveloppe

adrinjalali · 2018-09-25T16:21:18Z

@albertcthomas, sorry, just had a chance to test them. You're right. That was a silly mistake of mine (copy paste issues).

I also investigated the other issue. It's odd, but passing an integer instead of prng fixes the issue. No idea why. Thanks, closing.

albertcthomas mentioned this issue Sep 21, 2018

[MRG] DOC covariance doctest examples #12124

Merged

adrinjalali closed this as completed Sep 25, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

EllipticEnvelope and GraphicalLasso: inconsistent results under different setups #12127

EllipticEnvelope and GraphicalLasso: inconsistent results under different setups #12127

adrinjalali commented Sep 21, 2018 •

edited by TomDLT

Loading

albertcthomas commented Sep 21, 2018 •

edited

Loading

albertcthomas commented Sep 21, 2018

albertcthomas commented Sep 24, 2018

albertcthomas commented Sep 24, 2018

adrinjalali commented Sep 25, 2018

EllipticEnvelope and GraphicalLasso: inconsistent results under different setups #12127

EllipticEnvelope and GraphicalLasso: inconsistent results under different setups #12127

Comments

adrinjalali commented Sep 21, 2018 • edited by TomDLT Loading

albertcthomas commented Sep 21, 2018 • edited Loading

albertcthomas commented Sep 21, 2018

albertcthomas commented Sep 24, 2018

albertcthomas commented Sep 24, 2018

adrinjalali commented Sep 25, 2018

adrinjalali commented Sep 21, 2018 •

edited by TomDLT

Loading

albertcthomas commented Sep 21, 2018 •

edited

Loading