-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Add an optional l2 regularization to the graphical Lasso #12228
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I would like to work on this issue |
I would like to work on this issue
I will do it. I already have the code that does it. I just had to write
it today for my research.
Beside, how would you tackle it mathematically? There good way to do it
that is simpler and more robust than the obvious way. It just requires a
small proof that I was planning to attach to the PR.
|
Sounds good but please include a plot of how this impacts the existing examples in the PR. Have you exchanged about this l2 penalty with the authors of the glasso package? As far as I understand they do not include l2 penalty either but it would be great to have their opinion on this. |
Sounds good but please include a plot of how this impacts the existing examples
in the PR.
Well, if we keep it to a tiny amount by default (ie 1e-10), it will not
change anything.
Have you exchanged about this l2 penalty with the authors of the glasso
package? As far as I understand they do not include l2 penalty either but it
would be great to have their opinion on this.
I don't think that I'll invest the time for this.
In my eyes, the benefits of this modification is that it can make the
solver more stable. We are finding situations where we cannot get the
solver in scikit-learn to converge to SPD matrices. We developed this
variant to get it to converge.
Here is what I suggest: either I can try to find a small synthetic
example where our solver crashes and a little of l2 regularization makes
it converge. If I find this, I do a PR, if not, we just forget it.
WDYT?
|
|
Four our package http://github.com/metric-learn/metric-learn, we would be really interested in having a more robust Graphical Lasso too, as we use it for the I don't know if all of this is related to L2 regularization though, but here seemed a good place to add this comment. I can provide examples if needed, just tell me which one could help: for instance first I could provide a snippet with the new scikit-learn version not working and the old one working ? |
Here is a first snippet to show that with an With sklearn's current version (master)
In [2]: import numpy as np
...: from sklearn.covariance import graph_lasso
...: from inverse_covariance import quic
...:
...: A = np.array([[6.,2.],
...: [2., 1.]])
...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10,
...: cov_init=np.eye(A.shape[0]))
...: print(cov)
...:
...: _, cov, _, _, _, _ = quic(A, lam=0.1)
...: print(cov)
...:
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration 0, cost 1.19e+01, dual gap -1.038e+00
[graphical_lasso] Iteration 1, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 2, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 3, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 4, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 5, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 6, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 7, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 8, cost 1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration 9, cost 1.19e+01, dual gap -9.262e-01
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -9.262e-01
% (max_iter, d_gap), ConvergenceWarning)
[[6. 1.9]
[1.9 6. ]]
[[6.00000005 1.9 ]
[1.9 1.00000001]] With sklearn's older version:
In [2]: import numpy as np
...: from sklearn.covariance import graph_lasso
...:
...: A = np.array([[6.,2.],
...: [2., 1.]])
...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10,
...: cov_init=np.eye(A.shape[0]))
...: print(cov)
...:
[graph_lasso] Iteration 0, cost inf, dual gap -1.510e+00
[graph_lasso] Iteration 1, cost 1.02e+01, dual gap 3.331e-16
[[ 6. 1.9]
[ 1.9 1. ]] |
Maybe it is good to add skggm's |
I agree, I just update my comment with skggm's |
Actually not, the current sklearn version finds 6 instead of 1 for the lower right value |
Ah yes that's right, I looked too fast |
Here is an example with a non SPD matrix given as input: In [1]: import numpy as np
...: from sklearn.covariance import graph_lasso
...: from inverse_covariance import quic
...:
...: A = np.array([[96.,12.],
...: [12., -61.]])
...:
...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10,
...: cov_init=np.eye(A.shape[0]))
...: print(cov)
...:
...: _, cov, _, _, _, _ = quic(A, lam=0.1)
...: print(cov)
...:
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration 0, cost 1.68e+01, dual gap -1.677e+00
[graphical_lasso] Iteration 1, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 2, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 3, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 4, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 5, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 6, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 7, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 8, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 9, cost 1.68e+01, dual gap -1.661e+00
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -1.661e+00
% (max_iter, d_gap), ConvergenceWarning)
[[96. 11.9]
[11.9 96. ]]
[[8.76291850e+07 4.96303739e+03]
[4.96303739e+03 2.81091333e-01]] |
It's as if scikit-learn's graphical lasso would estimate a covariance matrix that is similar to the original matrix except the right bottom coeff, which is the same as the upper left one (like in the last case) |
Here is an example a bit different that confirms the above (I just switched the sign on the diagonal): In [1]: import numpy as np
...: from sklearn.covariance import graph_lasso
...: from inverse_covariance import quic
...:
...: A = np.array([[-96.,12.],
...: [12., 61.]])
...:
...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10,
...: cov_init=np.eye(A.shape[0]))
...: print(cov)
...:
...: _, cov, _, _, _, _ = quic(A, lam=0.1)
...: print(cov)
...:
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration 0, cost 1.68e+01, dual gap -1.677e+00
[graphical_lasso] Iteration 1, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 2, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 3, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 4, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 5, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 6, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 7, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 8, cost 1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration 9, cost 1.68e+01, dual gap -1.661e+00
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -1.661e+00
% (max_iter, d_gap), ConvergenceWarning)
[[-96. 11.9]
[ 11.9 -96. ]]
[[6.13757973e-01 2.86762444e+04]
[2.86762444e+04 1.33982777e+09]] We see that scikit-learn estimated covariance is not SPD (which is problematic for SDML's case but for a real covariance case I agree the input should be SPD) |
Also, in both last cases (non SPD input matrix), old scikit-learn would return:
|
For the example in the code snippet #12228 (comment), sklearn v0.19.2 converges correctly while v0.20rc1 does not (iit returns the same output as the current v0.21.dev0 tried in the above comment). This holds for Python 2 and 3 and for both solvers (CD and LARS), using numpy 0.15.4. |
For the example in the code snippet #12228 (comment), sklearn v0.19.2 converges correctly while v0.20rc1 does not (in the same way as the current v0.21.dev0 tried above.
It might be worth attempting a git bisect on that.
|
The graphical lasso is quite unstable on very ill-conditioned matrix. It is a well known issue that arises from the fact that it is a very optimization problem.
Adding a small amount of l2-regularization helps a lot. On a problem in my lab, it ended being the only way to estimate a sparse inverse covariance. It is actually a simple modification. If people agree, I can try to find the time to contribute this.
What do people think?
The text was updated successfully, but these errors were encountered: