Skip to content

Change graph_lasso to exploit block diagonal structure #4787

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

bnaul
Copy link
Contributor

@bnaul bnaul commented May 29, 2015

I took a stab at implementing the optimization described here: the block diagonal structure of the graphical lasso solution can be identified by thresholding the sample covariance, and the exact solution is found by solving the graphical lasso for each block separately. The authors find that there is a huge speedup when the solution is very sparse; when the solution is mostly dense, the results are basically the same (or very slightly slower due to the extra thresholding step). This modification was made in the glasso R package some time ago; timing results are given in the paper above, but I also ran a test of my implementation with p=1000, n=100 and a block diagonal population covariance matrix.

Couple of questions:

  1. the glasso R package was changed to only use this algorithm, so I followed the same convention and did not allow the user to choose whether to perform the block diagonal screening procedure. It would be very easy to add this, I'm just not sure if there's a case where it would ever be desired.
  2. There's a bug in our connected_components function that was fixed a while ago in scipy (BUG: make a fully connected csgraph from an ndarray with no zeros scipy/scipy#3819). I included this in my commit, but maybe it should be a separate pull request?
  3. Does the overall logic make sense here? I only added a couple of comments but if it's not clear what's going on then I can try to clarify.

@amueller
Copy link
Member

amueller commented Jun 1, 2015

Thanks for the PR. This looks like a great addition. We have a bit of a backlog on contributions, unfortunately. Maybe @GaelVaroquaux can find time to look at this one, as he created the original estimator.

@bnaul
Copy link
Contributor Author

bnaul commented Jun 1, 2015

No problem, obviously nothing urgent.

On Mon, Jun 1, 2015 at 10:32 AM Andreas Mueller notifications@github.com
wrote:

Thanks for the PR. This looks like a great addition. We have a bit of a
backlog on contributions, unfortunately. Maybe @GaelVaroquaux
https://github.com/GaelVaroquaux can find time to look at this one, as
he created the original estimator.


Reply to this email directly or view it on GitHub
#4787 (comment)
.

@NelleV
Copy link
Member

NelleV commented Aug 9, 2016

Hi Brett,

Do you mind rebasing master onto the branch? The PR is kind of old :)

Cheers,
N

@agramfort
Copy link
Member

@bnaul can you rebase? I will have a look this week

@bnaul bnaul force-pushed the block_glasso branch 3 times, most recently from 406f6f3 to cfdb3fd Compare June 14, 2017 22:30
@bnaul
Copy link
Contributor Author

bnaul commented Jun 14, 2017

@agramfort rebased! Travis is only unhappy because of flake8 failures but those are all just lines that were already too long in the old version.

@NelleV
Copy link
Member

NelleV commented Jun 20, 2017

@agramfort any luck in reviewing this?

@agramfort
Copy link
Member

agramfort commented Jun 20, 2017 via email

The block diagonal components of the graphical lasso solution can be
identified by thresholding the same covariance matrix; the solution can
then be found by solving a subproblem corresponding to each component,
which can be much faster if the graph is very sparse. See
http://faculty.washington.edu/dwitten/Papers/jcgs.2011.pdf for details.
@stevendbrown
Copy link
Contributor

stevendbrown commented Oct 1, 2017

@bnaul Testing this modification with my completely-non-zero 1288x1288 input matrix has a runtime ~2x higher than the current master version of the function, and spends 80% of the runtime on the sub_covariance = np.ascontiguousarray(...) call. I am trying to merge this version of graph_lasso with the changes in #9858, but you will likely be able to implement the fix to memory allocation time better than I.

@cmarmo cmarmo added Needs Benchmarks A tag for the issues and PRs which require some benchmarks help wanted labels Sep 21, 2020
Base automatically changed from master to main January 22, 2021 10:48
@bnaul bnaul closed this Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement help wanted module:covariance Needs Benchmarks A tag for the issues and PRs which require some benchmarks
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants