Skip to content

Add an optional l2 regularization to the graphical Lasso #12228

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
GaelVaroquaux opened this issue Sep 30, 2018 · 17 comments
Open

Add an optional l2 regularization to the graphical Lasso #12228

GaelVaroquaux opened this issue Sep 30, 2018 · 17 comments

Comments

@GaelVaroquaux
Copy link
Member

The graphical lasso is quite unstable on very ill-conditioned matrix. It is a well known issue that arises from the fact that it is a very optimization problem.

Adding a small amount of l2-regularization helps a lot. On a problem in my lab, it ended being the only way to estimate a sparse inverse covariance. It is actually a simple modification. If people agree, I can try to find the time to contribute this.

What do people think?

@pulkit6559
Copy link

I would like to work on this issue

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Sep 30, 2018 via email

@ogrisel
Copy link
Member

ogrisel commented Oct 1, 2018

What do people think?

Sounds good but please include a plot of how this impacts the existing examples in the PR.

Have you exchanged about this l2 penalty with the authors of the glasso package? As far as I understand they do not include l2 penalty either but it would be great to have their opinion on this.

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Oct 2, 2018 via email

@mnarayan
Copy link

Beside, how would you tackle it mathematically? There good way to do it
that is simpler and more robust than the obvious way. It just requires a
small proof that I was planning to attach to the PR. - #12228 (comment)

As far as I understand they do not include l2 penalty either but it would be great to have their opinion on this. - #12228 (comment)

  1. Some information pertinent to the above comments.

    • You can find more mathematical justification for a graphical elastic net in Allen's dissertation, 2.3.1
    • Rothman also formulated a slightly different objective to ensure positive semidefinite in Positive definite estimators of large covariance matrices
    • My guess is that a graphical elastic net is often used but not been formally implemented due to inertia and there being many other interesting things to work on.
  2. The Friedman algorithm to solve the graphical lasso problem has a known convergence issue that some recent solvers do not. I looked into this a while back here. You might find the examples from the Mazumder paper we used to be a good test case. Note that adding a ridge penalty is not necessary to fix this particular problem convergence problem, though having a well conditioned sample covariance to start with is still useful.

  3. A lot of applications that explicitly study the spectral properties of a covariance matrix could benefit from condition number regularization via nonlinear shrinkage of large and small eigenvalues. I have an internal implementation of condition number regularized MLE that I would be happy to contribute if there is interest in a more general solution. This doesn't distort the eigenstructure as much as Ledoit-Wolf shrinkage.

@wdevazelhes
Copy link
Contributor

Four our package http://github.com/metric-learn/metric-learn, we would be really interested in having a more robust Graphical Lasso too, as we use it for the SDML algorithm. Right now, we also have an issue with graphical lasso, but skggm seems to fix it (see PR scikit-learn-contrib/metric-learn#162). Note that in our case we sometimes need to find a sparse (SPD) precision matrix so that its inverse is the closest to a non SPD "covariance-like" matrix (i.e. the "covariance" matrix we pass to graphical lasso can be non SPD). sklearn's graphical lasso seems to not work for such matrices, contrary to skggm (though we need to put have an SPD matrix as initialization which seems normal).
Also, I have noticed that current scikit-learn's graphical lasso does not seem to work even if I put an SPD matrix to reconstruct, whereas it worked with some older versions of scikit-learn.

I don't know if all of this is related to L2 regularization though, but here seemed a good place to add this comment.

I can provide examples if needed, just tell me which one could help: for instance first I could provide a snippet with the new scikit-learn version not working and the old one working ?

@wdevazelhes
Copy link
Contributor

wdevazelhes commented Jan 31, 2019

Here is a first snippet to show that with an SPD input matrix, the new version of sklearn's graphical_lasso does not seem to work contrary to an older version:

With sklearn's current version (master)

System:
    python: 3.7.1 (default, Dec 14 2018, 19:28:38)  [GCC 7.3.0]
executable: /home/will/anaconda3/envs/standard/bin/python
   machine: Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /home/will/anaconda3/envs/standard/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 18.1
setuptools: 40.6.3
   sklearn: 0.21.dev0
     numpy: 1.15.4
     scipy: 1.2.0
    Cython: 0.29.2
    pandas: 0.24.0
In [2]: import numpy as np 
   ...: from sklearn.covariance import graph_lasso 
   ...: from inverse_covariance import quic 
   ...:  
   ...: A = np.array([[6.,2.], 
   ...:               [2., 1.]]) 
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10, 
   ...:                    cov_init=np.eye(A.shape[0])) 
   ...: print(cov) 
   ...:  
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1) 
   ...: print(cov) 
   ...:                                                                         
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
  warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration   0, cost  1.19e+01, dual gap -1.038e+00
[graphical_lasso] Iteration   1, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   2, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   3, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   4, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   5, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   6, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   7, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   8, cost  1.19e+01, dual gap -9.262e-01
[graphical_lasso] Iteration   9, cost  1.19e+01, dual gap -9.262e-01
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -9.262e-01
  % (max_iter, d_gap), ConvergenceWarning)
[[6.  1.9]
 [1.9 6. ]]
[[6.00000005 1.9       ]
 [1.9        1.00000001]]

With sklearn's older version:

Linux-4.4.0-141-generic-x86_64-with-debian-stretch-sid
('Python', '2.7.15 |Anaconda, Inc.| (default, Dec 14 2018, 19:04:19) \n[GCC 7.3.0]')
('NumPy', '1.11.3')
('SciPy', '0.17.1')
('Scikit-Learn', '0.19.2')
In [2]: import numpy as np
   ...: from sklearn.covariance import graph_lasso
   ...: 
   ...: A = np.array([[6.,2.],
   ...:               [2., 1.]])
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10,
   ...:                    cov_init=np.eye(A.shape[0]))
   ...: print(cov)
   ...: 
[graph_lasso] Iteration   0, cost  inf, dual gap -1.510e+00
[graph_lasso] Iteration   1, cost  1.02e+01, dual gap 3.331e-16
[[ 6.   1.9]
 [ 1.9  1. ]]

@bellet
Copy link
Contributor

bellet commented Jan 31, 2019

Maybe it is good to add skggm's quic for comparison?

@wdevazelhes
Copy link
Contributor

I agree, I just update my comment with skggm's quic (only for the new sklearn's and python version, because I didn't manage to install skggm on the older environment (see skggm/skggm#119). I chose cov_init=np.eye(A.shape[0]) (identity) which is the default for skggm, so that we can compare both (I use identity and not covariance because for the non SPD case I'll use this too (and my "covariance" will be non SPD)) I also added the estimated covariance so that we can see the results. We can see that in the end in the current version even if it says it doesn't converge the estimate is coherent with skggm's one.
Now I'll add some snippets for the non SPD case

@bellet
Copy link
Contributor

bellet commented Feb 1, 2019

We can see that in the end in the current version even if it says it doesn't converge the estimate is coherent with skggm's one.

Actually not, the current sklearn version finds 6 instead of 1 for the lower right value

@wdevazelhes
Copy link
Contributor

Actually not, the current sklearn version finds 1 instead of 6 for the lower right value

Ah yes that's right, I looked too fast

@wdevazelhes
Copy link
Contributor

Here is an example with a non SPD matrix given as input:

In [1]: import numpy as np 
   ...: from sklearn.covariance import graph_lasso 
   ...: from inverse_covariance import quic 
   ...:  
   ...: A = np.array([[96.,12.], 
   ...:               [12., -61.]]) 
   ...:  
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10, 
   ...:                    cov_init=np.eye(A.shape[0])) 
   ...: print(cov) 
   ...:  
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1) 
   ...: print(cov) 
   ...:                                                                                                                                                                                                             
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
  warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration   0, cost  1.68e+01, dual gap -1.677e+00
[graphical_lasso] Iteration   1, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   2, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   3, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   4, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   5, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   6, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   7, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   8, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   9, cost  1.68e+01, dual gap -1.661e+00
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -1.661e+00
  % (max_iter, d_gap), ConvergenceWarning)
[[96.  11.9]
 [11.9 96. ]]
[[8.76291850e+07 4.96303739e+03]
 [4.96303739e+03 2.81091333e-01]]

@wdevazelhes
Copy link
Contributor

It's as if scikit-learn's graphical lasso would estimate a covariance matrix that is similar to the original matrix except the right bottom coeff, which is the same as the upper left one (like in the last case)

@wdevazelhes
Copy link
Contributor

Here is an example a bit different that confirms the above (I just switched the sign on the diagonal):

In [1]: import numpy as np 
   ...: from sklearn.covariance import graph_lasso 
   ...: from inverse_covariance import quic 
   ...:  
   ...: A = np.array([[-96.,12.], 
   ...:               [12., 61.]]) 
   ...:  
   ...: cov, _ = graph_lasso(A, alpha=0.1, verbose=True, max_iter=10, 
   ...:                    cov_init=np.eye(A.shape[0])) 
   ...: print(cov) 
   ...:  
   ...: _, cov, _, _, _, _ = quic(A, lam=0.1) 
   ...: print(cov) 
   ...:                                                                                                                                                                                                                                                                                    
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/utils/deprecation.py:77: DeprecationWarning: Function graph_lasso is deprecated; The 'graph_lasso' was renamed to 'graphical_lasso' in version 0.20 and will be removed in 0.22.
  warnings.warn(msg, category=DeprecationWarning)
[graphical_lasso] Iteration   0, cost  1.68e+01, dual gap -1.677e+00
[graphical_lasso] Iteration   1, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   2, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   3, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   4, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   5, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   6, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   7, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   8, cost  1.68e+01, dual gap -1.661e+00
[graphical_lasso] Iteration   9, cost  1.68e+01, dual gap -1.661e+00
/home/will/Code/sklearn-forks/wdevazelhes/scikit-learn/sklearn/covariance/graph_lasso_.py:265: ConvergenceWarning: graphical_lasso: did not converge after 10 iteration: dual gap: -1.661e+00
  % (max_iter, d_gap), ConvergenceWarning)
[[-96.   11.9]
 [ 11.9 -96. ]]
[[6.13757973e-01 2.86762444e+04]
 [2.86762444e+04 1.33982777e+09]]

We see that scikit-learn estimated covariance is not SPD (which is problematic for SDML's case but for a real covariance case I agree the input should be SPD)

@wdevazelhes
Copy link
Contributor

Also, in both last cases (non SPD input matrix), old scikit-learn would return:

FloatingPointError: Non SPD result: the system is too ill-conditioned for this solver. The system is too ill-conditioned for this solver

@bellet
Copy link
Contributor

bellet commented Feb 26, 2019

For the example in the code snippet #12228 (comment), sklearn v0.19.2 converges correctly while v0.20rc1 does not (iit returns the same output as the current v0.21.dev0 tried in the above comment).

This holds for Python 2 and 3 and for both solvers (CD and LARS), using numpy 0.15.4.

@GaelVaroquaux
Copy link
Member Author

GaelVaroquaux commented Feb 26, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants