[MRG+1] Reduce runtime of graph_lasso #9858

stevendbrown · 2017-09-30T22:19:14Z

Reference Issue

None.

What does this implement/fix? Explain your changes.

For a 1288x1288 empirical covariance matrix, sklearn.covariance.graph_lasso currently spends 70-80% of it's runtime creating sub_covariance on my computer. This change reduces that to ~3.5% of the function's runtime, and np.allclose(...) comparing both of the arrays in the result tuple return True for each. Iterations have the same losses & dual gaps, and calls to graph_lasso end after the same number of iterations, so I don't see a functional change with my data.

Any other comments?

Just noticed that this may interact with #4787. My dataset is quite dense, so the runtime of graph_lasso based on that PR is almost double that of the original function.

TomDLT · 2017-10-02T15:02:17Z

LGTM

Maybe add a line comment to explain that we only need to update one line and one column.
The tests fail because your changes are not PEP8 (line too long).

stevendbrown · 2017-10-02T15:21:38Z

Hi @TomDLT, I think the PEP8 error is now resolved; the tests looked OK on my screen. I've added said comment. It took me a little while to convince myself that this was true, but this short script can also verify it for anyone who is interested:

import numpy as np

t = np.random.randint(1, 100, size=(100,100))

indices = np.arange(t.shape[0])
subt = np.ascontiguousarray(t[1:, 1:])
for idx in indices:
    if idx > 0:
        subt[idx-1] = t[idx-1][indices != idx]
        subt[:, idx-1] = t[:, idx-1][indices != idx]
    else:
        subt[:] = t[1:, 1:]
    assert np.all(subt == t[indices != idx].T[indices != idx].T)

TomDLT · 2017-10-02T15:35:44Z

sklearn/covariance/graph_lasso_.py

        for i in range(max_iter):
            for idx in range(n_features):
-                sub_covariance = np.ascontiguousarray(
-                    covariance_[indices != idx].T[indices != idx])
+                # for each idx, the sub_covariance matrix is the covariance_


I think this comment is good but a bit too long.
I would have gone with something less verbose, such as:

# To keep the contiguous matrix `sub_covariance` equal to # covariance_[indices != idx].T[indices != idx] # we only need to update 1 column and 1 line when idx changes

TomDLT · 2017-10-02T15:38:02Z

I think the PEP8 error is now resolved;

Indeed, for some weird reason I don't see the commit 070fdcf in this page...

Anyway, thanks for the script, this is indeed non trivial.

stevendbrown · 2017-10-02T15:47:16Z

@TomDLT I've replaced my version of the comment with yours. Thanks.

TomDLT · 2017-10-02T15:58:19Z

All right, I approved this change, let's just wait for a second reviewer

agramfort · 2017-10-02T19:40:53Z

thx @stevendbrown

* reduce runtime of graph_lasso * fixed line length overrun * added comment explaining the change * changed explanation comment

reduce runtime of graph_lasso

478567e

stevendbrown mentioned this pull request Oct 1, 2017

Change graph_lasso to exploit block diagonal structure #4787

Closed

fixed line length overrun

070fdcf

TomDLT changed the title ~~reduce runtime of graph_lasso~~ [MRG] Reduce runtime of graph_lasso Oct 2, 2017

added comment explaining the change

ff7252b

TomDLT reviewed Oct 2, 2017

View reviewed changes

changed explanation comment

fb58245

TomDLT changed the title ~~[MRG] Reduce runtime of graph_lasso~~ [MRG+1] Reduce runtime of graph_lasso Oct 2, 2017

agramfort merged commit 692cd8b into scikit-learn:master Oct 2, 2017

stevendbrown deleted the graph_lasso_runtime branch May 29, 2018 13:28

bellet mentioned this pull request Feb 26, 2019

[MRG] Fix bug in graph lasso when n_features=2 #13276

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG+1] Reduce runtime of graph_lasso #9858

[MRG+1] Reduce runtime of graph_lasso #9858

stevendbrown commented Sep 30, 2017 •

edited

Loading

TomDLT commented Oct 2, 2017

stevendbrown commented Oct 2, 2017 •

edited

Loading

TomDLT Oct 2, 2017

TomDLT commented Oct 2, 2017

stevendbrown commented Oct 2, 2017

TomDLT commented Oct 2, 2017

agramfort commented Oct 2, 2017

[MRG+1] Reduce runtime of graph_lasso #9858

[MRG+1] Reduce runtime of graph_lasso #9858

Conversation

stevendbrown commented Sep 30, 2017 • edited Loading

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

TomDLT commented Oct 2, 2017

stevendbrown commented Oct 2, 2017 • edited Loading

TomDLT Oct 2, 2017

Choose a reason for hiding this comment

TomDLT commented Oct 2, 2017

stevendbrown commented Oct 2, 2017

TomDLT commented Oct 2, 2017

agramfort commented Oct 2, 2017

stevendbrown commented Sep 30, 2017 •

edited

Loading

stevendbrown commented Oct 2, 2017 •

edited

Loading