Add an optional l2 regularization to the graphical Lasso #12228

GaelVaroquaux · 2018-09-30T13:42:49Z

The graphical lasso is quite unstable on very ill-conditioned matrix. It is a well known issue that arises from the fact that it is a very optimization problem.

Adding a small amount of l2-regularization helps a lot. On a problem in my lab, it ended being the only way to estimate a sparse inverse covariance. It is actually a simple modification. If people agree, I can try to find the time to contribute this.

What do people think?

pulkit6559 · 2018-09-30T19:43:48Z

I would like to work on this issue

GaelVaroquaux · 2018-09-30T19:48:16Z

I would like to work on this issue

I will do it. I already have the code that does it. I just had to write it today for my research. Beside, how would you tackle it mathematically? There good way to do it that is simpler and more robust than the obvious way. It just requires a small proof that I was planning to attach to the PR.

ogrisel · 2018-10-01T08:50:03Z

What do people think?

Sounds good but please include a plot of how this impacts the existing examples in the PR.

Have you exchanged about this l2 penalty with the authors of the glasso package? As far as I understand they do not include l2 penalty either but it would be great to have their opinion on this.

GaelVaroquaux · 2018-10-02T17:42:11Z

Sounds good but please include a plot of how this impacts the existing examples in the PR.

Well, if we keep it to a tiny amount by default (ie 1e-10), it will not change anything.

Have you exchanged about this l2 penalty with the authors of the glasso package? As far as I understand they do not include l2 penalty either but it would be great to have their opinion on this.

I don't think that I'll invest the time for this. In my eyes, the benefits of this modification is that it can make the solver more stable. We are finding situations where we cannot get the solver in scikit-learn to converge to SPD matrices. We developed this variant to get it to converge. Here is what I suggest: either I can try to find a small synthetic example where our solver crashes and a little of l2 regularization makes it converge. If I find this, I do a PR, if not, we just forget it. WDYT?

mnarayan · 2019-01-23T21:32:30Z

Beside, how would you tackle it mathematically? There good way to do it
that is simpler and more robust than the obvious way. It just requires a
small proof that I was planning to attach to the PR. - #12228 (comment)

As far as I understand they do not include l2 penalty either but it would be great to have their opinion on this. - #12228 (comment)

Some information pertinent to the above comments.
- You can find more mathematical justification for a graphical elastic net in Allen's dissertation, 2.3.1
- Rothman also formulated a slightly different objective to ensure positive semidefinite in Positive definite estimators of large covariance matrices
- My guess is that a graphical elastic net is often used but not been formally implemented due to inertia and there being many other interesting things to work on.
The Friedman algorithm to solve the graphical lasso problem has a known convergence issue that some recent solvers do not. I looked into this a while back here. You might find the examples from the Mazumder paper we used to be a good test case. Note that adding a ridge penalty is not necessary to fix this particular problem convergence problem, though having a well conditioned sample covariance to start with is still useful.
A lot of applications that explicitly study the spectral properties of a covariance matrix could benefit from condition number regularization via nonlinear shrinkage of large and small eigenvalues. I have an internal implementation of condition number regularized MLE that I would be happy to contribute if there is interest in a more general solution. This doesn't distort the eigenstructure as much as Ledoit-Wolf shrinkage.

wdevazelhes · 2019-01-31T14:40:28Z

Four our package http://github.com/metric-learn/metric-learn, we would be really interested in having a more robust Graphical Lasso too, as we use it for the SDML algorithm. Right now, we also have an issue with graphical lasso, but skggm seems to fix it (see PR scikit-learn-contrib/metric-learn#162). Note that in our case we sometimes need to find a sparse (SPD) precision matrix so that its inverse is the closest to a non SPD "covariance-like" matrix (i.e. the "covariance" matrix we pass to graphical lasso can be non SPD). sklearn's graphical lasso seems to not work for such matrices, contrary to skggm (though we need to put have an SPD matrix as initialization which seems normal).
Also, I have noticed that current scikit-learn's graphical lasso does not seem to work even if I put an SPD matrix to reconstruct, whereas it worked with some older versions of scikit-learn.

I don't know if all of this is related to L2 regularization though, but here seemed a good place to add this comment.

I can provide examples if needed, just tell me which one could help: for instance first I could provide a snippet with the new scikit-learn version not working and the old one working ?

wdevazelhes · 2019-01-31T15:43:04Z

Here is a first snippet to show that with an SPD input matrix, the new version of sklearn's graphical_lasso does not seem to work contrary to an older version:

With sklearn's current version (master)

System:
    python: 3.7.1 (default, Dec 14 2018, 19:28:38)  [GCC 7.3.0]
executable: /home/will/anaconda3/envs/standard/bin/python
   machine: Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/will/anaconda3/envs/standard/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 18.1
setuptools: 40.6.3
   sklearn: 0.21.dev0
     numpy: 1.15.4
     scipy: 1.2.0
    Cython: 0.29.2
    pandas: 0.24.0

In [2]: import numpy as np 
   ...: from sklearn.covariance import graph_lasso 
   ...: from inverse_covariance import quic 
   ...:  
   ...: A = np.array([[6.,2.], 
   ...:               [2., 1.]]) 
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10, 
   ...:                    cov_init=np.eye(A.shape[0])) 
   ...: print(cov) 
   ...:  
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1) 
   ...: print(cov) 
   ...:                                                                         
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
  warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration   0, cost  1.19e+01, dual gap -1.038e+00
[graphical_lasso] Iteration   1, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   2, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   3, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   4, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   5, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   6, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   7, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   8, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   9, cost  1.19e+01, dual gap -9.262e-01
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -9.262e-01
  % (max_iter, d_gap), ConvergenceWarning)
[[6.  1.9]
 [1.9 6. ]]
[[6.00000005 1.9       ]
 [1.9        1.00000001]]

With sklearn's older version:

Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19) \n[GCC 7.3.0]')
('NumPy', '1.11.3')
('SciPy', '0.17.1')
('Scikit-Learn', '0.19.2')

In [2]: import numpy as np
   ...: from sklearn.covariance import graph_lasso
   ...: 
   ...: A = np.array([[6.,2.],
   ...:               [2., 1.]])
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10,
   ...:                    cov_init=np.eye(A.shape[0]))
   ...: print(cov)
   ...: 
[graph_lasso] Iteration   0, cost  inf, dual gap -1.510e+00
[graph_lasso] Iteration   1, cost  1.02e+01, dual gap 3.331e-16
[[ 6.   1.9]
 [ 1.9  1. ]]

bellet · 2019-01-31T16:11:53Z

Maybe it is good to add skggm's quic for comparison?

wdevazelhes · 2019-02-01T09:34:40Z

I agree, I just update my comment with skggm's quic (only for the new sklearn's and python version, because I didn't manage to install skggm on the older environment (see skggm/skggm#119). I chose cov_init=np.eye(A.shape[0]) (identity) which is the default for skggm, so that we can compare both (I use identity and not covariance because for the non SPD case I'll use this too (and my "covariance" will be non SPD)) I also added the estimated covariance so that we can see the results. We can see that in the end in the current version even if it says it doesn't converge the estimate is coherent with skggm's one.
Now I'll add some snippets for the non SPD case

bellet · 2019-02-01T09:58:06Z

We can see that in the end in the current version even if it says it doesn't converge the estimate is coherent with skggm's one.

Actually not, the current sklearn version finds 6 instead of 1 for the lower right value

wdevazelhes · 2019-02-01T10:31:24Z

Actually not, the current sklearn version finds 1 instead of 6 for the lower right value

Ah yes that's right, I looked too fast

wdevazelhes · 2019-02-01T10:33:05Z

Here is an example with a non SPD matrix given as input:

In [1]: import numpy as np 
   ...: from sklearn.covariance import graph_lasso 
   ...: from inverse_covariance import quic 
   ...:  
   ...: A = np.array([[96.,12.], 
   ...:               [12., -61.]]) 
   ...:  
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10, 
   ...:                    cov_init=np.eye(A.shape[0])) 
   ...: print(cov) 
   ...:  
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1) 
   ...: print(cov) 
   ...:                                                                                                                                                                                                             
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
  warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration   0, cost  1.68e+01, dual gap -1.677e+00
[graphical_lasso] Iteration   1, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   2, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   3, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   4, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   5, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   6, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   7, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   8, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   9, cost  1.68e+01, dual gap -1.661e+00
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -1.661e+00
  % (max_iter, d_gap), ConvergenceWarning)
[[96.  11.9]
 [11.9 96. ]]
[[8.76291850e+07 4.96303739e+03]
 [4.96303739e+03 2.81091333e-01]]

wdevazelhes · 2019-02-01T10:47:52Z

It's as if scikit-learn's graphical lasso would estimate a covariance matrix that is similar to the original matrix except the right bottom coeff, which is the same as the upper left one (like in the last case)

wdevazelhes · 2019-02-01T10:53:54Z

Here is an example a bit different that confirms the above (I just switched the sign on the diagonal):

In [1]: import numpy as np 
   ...: from sklearn.covariance import graph_lasso 
   ...: from inverse_covariance import quic 
   ...:  
   ...: A = np.array([[-96.,12.], 
   ...:               [12., 61.]]) 
   ...:  
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10, 
   ...:                    cov_init=np.eye(A.shape[0])) 
   ...: print(cov) 
   ...:  
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1) 
   ...: print(cov) 
   ...:                                                                                                                                                                                                                                                                                    
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
  warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration   0, cost  1.68e+01, dual gap -1.677e+00
[graphical_lasso] Iteration   1, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   2, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   3, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   4, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   5, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   6, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   7, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   8, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   9, cost  1.68e+01, dual gap -1.661e+00
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -1.661e+00
  % (max_iter, d_gap), ConvergenceWarning)
[[-96.   11.9]
 [ 11.9 -96. ]]
[[6.13757973e-01 2.86762444e+04]
 [2.86762444e+04 1.33982777e+09]]

We see that scikit-learn estimated covariance is not SPD (which is problematic for SDML's case but for a real covariance case I agree the input should be SPD)

wdevazelhes · 2019-02-01T10:57:17Z

Also, in both last cases (non SPD input matrix), old scikit-learn would return:

FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this solver

bellet · 2019-02-26T10:07:02Z

For the example in the code snippet #12228 (comment), sklearn v0.19.2 converges correctly while v0.20rc1 does not (iit returns the same output as the current v0.21.dev0 tried in the above comment).

This holds for Python 2 and 3 and for both solvers (CD and LARS), using numpy 0.15.4.

GaelVaroquaux · 2019-02-26T10:31:04Z

For the example in the code snippet #12228 (comment), sklearn v0.19.2 converges correctly while v0.20rc1 does not (in the same way as the current v0.21.dev0 tried above.

It might be worth attempting a git bisect on that.

GaelVaroquaux added the Enhancement label Sep 30, 2018

bellet mentioned this issue Jan 31, 2019

[MRG] FIX: make proposal for sdml formulation scikit-learn-contrib/metric-learn#162

Merged

5 tasks

bellet mentioned this issue Feb 26, 2019

[MRG] Fix bug in graph lasso when n_features=2 #13276

Merged

cmarmo added the module:covariance label Nov 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add an optional l2 regularization to the graphical Lasso #12228

Add an optional l2 regularization to the graphical Lasso #12228

GaelVaroquaux commented Sep 30, 2018

pulkit6559 commented Sep 30, 2018

GaelVaroquaux commented Sep 30, 2018 via email

ogrisel commented Oct 1, 2018

GaelVaroquaux commented Oct 2, 2018 via email

mnarayan commented Jan 23, 2019

wdevazelhes commented Jan 31, 2019

wdevazelhes commented Jan 31, 2019 •

edited

Loading

bellet commented Jan 31, 2019

wdevazelhes commented Feb 1, 2019

bellet commented Feb 1, 2019 •

edited

Loading

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

bellet commented Feb 26, 2019 •

edited

Loading

GaelVaroquaux commented Feb 26, 2019 via email

Add an optional l2 regularization to the graphical Lasso #12228

Add an optional l2 regularization to the graphical Lasso #12228

Comments

GaelVaroquaux commented Sep 30, 2018

pulkit6559 commented Sep 30, 2018

GaelVaroquaux commented Sep 30, 2018 via email

ogrisel commented Oct 1, 2018

GaelVaroquaux commented Oct 2, 2018 via email

mnarayan commented Jan 23, 2019

wdevazelhes commented Jan 31, 2019

wdevazelhes commented Jan 31, 2019 • edited Loading

With sklearn's current version (master)

With sklearn's older version:

bellet commented Jan 31, 2019

wdevazelhes commented Feb 1, 2019

bellet commented Feb 1, 2019 • edited Loading

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

wdevazelhes commented Feb 1, 2019

bellet commented Feb 26, 2019 • edited Loading

GaelVaroquaux commented Feb 26, 2019 via email

wdevazelhes commented Jan 31, 2019 •

edited

Loading

bellet commented Feb 1, 2019 •

edited

Loading

bellet commented Feb 26, 2019 •

edited

Loading