Skip to content

EllipticEnvelope and GraphicalLasso: inconsistent results under different setups #12127

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
adrinjalali opened this issue Sep 21, 2018 · 5 comments

Comments

@adrinjalali
Copy link
Member

adrinjalali commented Sep 21, 2018

While adding examples to docstrings, two models (three classes) show an odd
behavior, i.e. they give different results under different circumstances and they
are not random, since the random seed and rand_state are fixed.
The results are deterministic under each setting, but change from setup to
setup.

For instance (observed in PR #12124), the EllipticEnvelope, has the following
issue (this is a failiur on travis).

073     >>> import numpy as np
074     >>> from sklearn.covariance import EllipticEnvelope
075     >>> real_cov = np.array([[.8, .3],
076     ...                      [.3, .4]])
077     >>> np.random.seed(0)
078     >>> X = np.random.multivariate_normal(mean=[0, 0],
079     ...                                   cov=real_cov,
080     ...                                   size=300)
081     >>> cov = EllipticEnvelope(random_state=0).fit(X)
082     >>> cov.covariance_ # doctest: +ELLIPSIS
Expected:
    array([[0.7411..., 0.2535...],
           [0.2535..., 0.3053...]])
Got:
    array([[0.81478325, 0.28653659],
           [0.28653659, 0.30913504]])

The same issue was observed in PR #11732, for GraphicalLasso and
GraphicalLassoCV.
Please note that the results are deterministic, i.e. changing the values to
what's reported by travis, would make the test pass, as I've done for the
PR #11732 .

The corresponding code resulting in the issue, is the following:

import numpy as np
from scipy import linalg
from sklearn.datasets import make_sparse_spd_matrix
from sklearn.covariance import GraphicalLasso, log_likelihood
n_samples = 60
n_features = 20
prng = np.random.RandomState(1)
prec = make_sparse_spd_matrix(n_features, alpha=.98,
                              smallest_coef=.4,
                              largest_coef=.7,
                              random_state=prng)
cov = linalg.inv(prec)
d = np.sqrt(np.diag(cov))
cov /= d
cov /= d[:, np.newaxis]
X = prng.multivariate_normal(np.zeros(n_features), cov, size=n_samples)
emp_cov = np.dot(X.T, X) / n_samples
model = GraphicalLasso()
loglik_est = -model.fit(X).score(X)
loglik_real = -log_likelihood(emp_cov, prec)
print("estimated negative log likelihood: %g" % loglik_est)
[here the difference between systems is: 26.1847 vs 26.1927]
print("real negative log likelihood: %g" % loglik_real)
[here the difference between systems is: 28.1526 vs 28.1067]
@albertcthomas
Copy link
Contributor

albertcthomas commented Sep 21, 2018

For the EllipticEnveloppe failure I don't know if that helps but

import numpy as np
from sklearn.covariance import EllipticEnvelope
from sklearn.covariance import MinCovDet
from sklearn.utils.testing import assert_array_equal
np.random.seed(0)
real_cov = np.array([[.8, .3], [.3, .4]])
X = np.random.multivariate_normal(mean=[0, 0], cov=real_cov, size=500)
mcd = MinCovDet(random_state=0).fit(X)
env = EllipticEnvelope(random_state=0).fit(X)
assert_array_equal(mcd.covariance_, env.covariance_)
print(mcd.covariance_)

returns

[[0.74118335 0.25357049]
 [0.25357049 0.30531502]]

@albertcthomas
Copy link
Contributor

You put size=300 in the doctest of EllipticEnveloppe but 500 in the doctest of MinCovDet

@albertcthomas
Copy link
Contributor

Maybe I misunderstood the issue for EllipticEnveloppe but in

Expected:
    array([[0.7411..., 0.2535...],
           [0.2535..., 0.3053...]])
Got:
    array([[0.81478325, 0.28653659],
           [0.28653659, 0.30913504]])

the expected result is the one you obtained with n_samples=500 whereas you put n_samples=300.

I think that if you put n_samples=500 in the doctest:

073     >>> import numpy as np
074     >>> from sklearn.covariance import EllipticEnvelope
075     >>> real_cov = np.array([[.8, .3],
076     ...                      [.3, .4]])
077     >>> np.random.seed(0)
078     >>> X = np.random.multivariate_normal(mean=[0, 0],
079     ...                                   cov=real_cov,
080     ...                                   size=300)                   # put 500 here instead of 300
081     >>> cov = EllipticEnvelope(random_state=0).fit(X)
082     >>> cov.covariance_ # doctest: +ELLIPSIS

the test will pass.

@albertcthomas
Copy link
Contributor

And the result should be the same for MinCovDet and EllipticEnveloppe

@adrinjalali
Copy link
Member Author

@albertcthomas, sorry, just had a chance to test them. You're right. That was a silly mistake of mine (copy paste issues).

I also investigated the other issue. It's odd, but passing an integer instead of prng fixes the issue. No idea why. Thanks, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants