Skip to content

[MRG] Fix bug in graph lasso when n_features=2 #13276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 27, 2019

Conversation

bellet
Copy link
Contributor

@bellet bellet commented Feb 26, 2019

Reference Issues/PRs

Fixes a bug introduced in #9858
Related to some discussions in #12228

What does this implement/fix? Explain your changes.

This PR fixes a subtle bug in the graph lasso implementation, which was introduced in PR #9858 proposing an online way to update the matrix sub_covariance (leading to significant speed ups for large dense matrices). This bug affects both the CD and LARS solvers as it is in the outer loop.

More precisely, the problem originates from the fact that np.ascontiguousarray does not always copy the original array:

In [1]: 
   ...: import numpy as np
   ...: a = np.arange(10)
   ...: print(a)
   ...: b = np.ascontiguousarray(a)
   ...: b[0] = -1
   ...: print(a)
   ...: 
[0 1 2 3 4 5 6 7 8 9]
[-1  1  2  3  4  5  6  7  8  9]

This problem led to the matrix covariance_ being modified when sub_covariance is updated, which is not correct.

Example of incorrect behavior on a simple PSD matrix taken from #12228 (comment):

In [1]: import numpy as np 
   ...: from sklearn.covariance import graphical_lasso
   ...: 
   ...: A = np.array([[6., 2.],
   ...:               [2., 1.]])
   ...: cov, _ = graphical_lasso(A, alpha=0.1, verbose=True, max_iter=5)
   ...: print(cov)
   ...: 
[graphical_lasso] Iteration   0, cost  1.24e+01, dual gap 4.728e-01
[graphical_lasso] Iteration   1, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   2, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   3, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   4, cost  1.19e+01, dual gap -9.262e-01
/home/aurelien/Documents/research/github/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 5 iteration: dual gap: -9.262e-01
  % (max_iter, d_gap), ConvergenceWarning)
[[6.  1.9]
 [1.9 6. ]]

With this PR:

In [1]: import numpy as np 
   ...: from sklearn.covariance import graphical_lasso
   ...: 
   ...: A = np.array([[6., 2.],
   ...:               [2., 1.]])
   ...: cov, _ = graphical_lasso(A, alpha=0.1, verbose=True, max_iter=5)
   ...: print(cov)
   ...: 
[graphical_lasso] Iteration   0, cost  1.02e+01, dual gap 3.331e-16
[[6.  1.9]
 [1.9 1. ]]

This is consistent with the output of graph lasso as implemented in package skggm:

In [1]: import numpy as np 
   ...: from inverse_covariance import quic
   ...: 
   ...: A = np.array([[6., 2.],
   ...:               [2., 1.]])
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1)
   ...: print(cov)
   ...:               
[[6.00000005 1.9       ]
 [1.9        1.00000001]]

UPDATE: the problem arises only when n_features=2, hence has small scope. This explains why the tests passed for PR #9858. The root cause of the problem can still valid for larger matrices, as seen in this example:

In [1]: import numpy as np
   ...: a = np.ones((3, 3))
   ...: print(a)
   ...: b = np.ascontiguousarray(a)
   ...: b[0] = [-1, -2, -1]
   ...: print(a)
   ...: 
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[-1. -2. -1.]
 [ 1.  1.  1.]
 [ 1.  1.  1.]]

However slicing the two dimensions at the same time (as done for graph lasso) forces np.ascontiguousarray to make a copy when n_features>2:

In [1]: import numpy as np
   ...: a = np.ones((3, 3))
   ...: print(a)
   ...: b = np.ascontiguousarray(a[1:, 1:])
   ...: b[0] = [-1, -2]
   ...: print(a)
   ...: 
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]
[[1. 1. 1.]
 [1. 1. 1.]
 [1. 1. 1.]]

When n_features=2, the output is a single value (which is already contiguous), hence the bug.

@bellet bellet changed the title [MRG] Fix bug in graph lasso [WIP] Fix bug in graph lasso Feb 26, 2019
@bellet
Copy link
Contributor Author

bellet commented Feb 26, 2019

I will try to add some tests

@bellet
Copy link
Contributor Author

bellet commented Feb 26, 2019

See my update in the PR text: it looks like the bug has small scope as it only occurs when n_features=2. I will add a specific test for this case.

@bellet bellet changed the title [WIP] Fix bug in graph lasso [WIP] Fix bug in graph lasso when n_features=2 Feb 26, 2019
@vene
Copy link
Member

vene commented Feb 26, 2019

Interesting but makes sense that it only happens when n_features=2; because it ends up with a size-1 tensor, right? This is the only case when such a slicing could yield a contiguous view; otherwise you can't avoid crossing some strides. Neat, I guess... 😵

the test is a good idea

@bellet
Copy link
Contributor Author

bellet commented Feb 26, 2019

Interesting but makes sense that it only happens when n_features=2; because it ends up with a size-1 tensor, right? This is the only case when such a slicing could yield a contiguous view; otherwise you can't avoid crossing some strides. Neat, I guess... dizzy_face

Yes, this is exactly what happens

@bellet
Copy link
Contributor Author

bellet commented Feb 26, 2019

I added a test for the case n_features=2, which failed before and passes now
This should now be ready to merge

@bellet bellet changed the title [WIP] Fix bug in graph lasso when n_features=2 [MRG] Fix bug in graph lasso when n_features=2 Feb 26, 2019
@jnothman jnothman added this to the 0.20.3 milestone Feb 26, 2019
Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this have your approval @vene? At a glance it looks good to me, but I don't feel I've looked carefully at the context.

Please add an entry to the change log at doc/whats_new/v0.20.rst under version 0.20.3. Like the other entries there, please reference this pull request with :issue: and credit yourself (and other contributors if applicable) with :user:

@vene
Copy link
Member

vene commented Feb 26, 2019

@jnothman the change itself has my approval, yes.

Since I'm out of date with our deprecation policies, i was waiting for someone else to weigh in on whether there should be an added test for the deprecated graph_lasso function which is just an alias for graphical_lasso or not.

@bellet
Copy link
Contributor Author

bellet commented Feb 26, 2019

All done !

Copy link
Member

@vene vene left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, thanks @bellet!

@jnothman
Copy link
Member

Since I'm out of date with our deprecation policies, i was waiting for someone else to weigh in on whether there should be an added test for the deprecated graph_lasso function which is just an alias for graphical_lasso or not.

My policy is that worrying about this sort of question for deprecated things is more effort than it's worth.

@agramfort agramfort merged commit adc1e59 into scikit-learn:master Feb 27, 2019
@agramfort
Copy link
Member

thx @bellet

@bellet bellet deleted the glasso_fix branch February 27, 2019 09:38
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Feb 27, 2019
* fix bug in graph lasso

* better fix

* add test for case n_features=2

* remove test for deprecated version

* update whats new

* note that the bug was a regression
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
* fix bug in graph lasso

* better fix

* add test for case n_features=2

* remove test for deprecated version

* update whats new

* note that the bug was a regression
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
* fix bug in graph lasso

* better fix

* add test for case n_features=2

* remove test for deprecated version

* update whats new

* note that the bug was a regression
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants