Skip to content

[MRG+1] Reduce runtime of graph_lasso #9858

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Oct 2, 2017

Conversation

stevendbrown
Copy link
Contributor

@stevendbrown stevendbrown commented Sep 30, 2017

Reference Issue

None.

What does this implement/fix? Explain your changes.

For a 1288x1288 empirical covariance matrix, sklearn.covariance.graph_lasso currently spends 70-80% of it's runtime creating sub_covariance on my computer. This change reduces that to ~3.5% of the function's runtime, and np.allclose(...) comparing both of the arrays in the result tuple return True for each. Iterations have the same losses & dual gaps, and calls to graph_lasso end after the same number of iterations, so I don't see a functional change with my data.

Any other comments?

Just noticed that this may interact with #4787. My dataset is quite dense, so the runtime of graph_lasso based on that PR is almost double that of the original function.

@TomDLT
Copy link
Member

TomDLT commented Oct 2, 2017

LGTM

  • Maybe add a line comment to explain that we only need to update one line and one column.
  • The tests fail because your changes are not PEP8 (line too long).

@TomDLT TomDLT changed the title reduce runtime of graph_lasso [MRG] Reduce runtime of graph_lasso Oct 2, 2017
@stevendbrown
Copy link
Contributor Author

stevendbrown commented Oct 2, 2017

Hi @TomDLT, I think the PEP8 error is now resolved; the tests looked OK on my screen. I've added said comment. It took me a little while to convince myself that this was true, but this short script can also verify it for anyone who is interested:

import numpy as np

t = np.random.randint(1, 100, size=(100,100))

indices = np.arange(t.shape[0])
subt = np.ascontiguousarray(t[1:, 1:])
for idx in indices:
    if idx > 0:
        subt[idx-1] = t[idx-1][indices != idx]
        subt[:, idx-1] = t[:, idx-1][indices != idx]
    else:
        subt[:] = t[1:, 1:]
    assert np.all(subt == t[indices != idx].T[indices != idx].T)

for i in range(max_iter):
for idx in range(n_features):
sub_covariance = np.ascontiguousarray(
covariance_[indices != idx].T[indices != idx])
# for each idx, the sub_covariance matrix is the covariance_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this comment is good but a bit too long.
I would have gone with something less verbose, such as:

                # To keep the contiguous matrix `sub_covariance` equal to
                # covariance_[indices != idx].T[indices != idx]
                # we only need to update 1 column and 1 line when idx changes

@TomDLT
Copy link
Member

TomDLT commented Oct 2, 2017

I think the PEP8 error is now resolved;

Indeed, for some weird reason I don't see the commit 070fdcf in this page...

Anyway, thanks for the script, this is indeed non trivial.

@stevendbrown
Copy link
Contributor Author

@TomDLT I've replaced my version of the comment with yours. Thanks.

@TomDLT TomDLT changed the title [MRG] Reduce runtime of graph_lasso [MRG+1] Reduce runtime of graph_lasso Oct 2, 2017
@TomDLT
Copy link
Member

TomDLT commented Oct 2, 2017

All right, I approved this change, let's just wait for a second reviewer

@agramfort agramfort merged commit 692cd8b into scikit-learn:master Oct 2, 2017
@agramfort
Copy link
Member

thx @stevendbrown

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
* reduce runtime of graph_lasso

* fixed line length overrun

* added comment explaining the change

* changed explanation comment
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
* reduce runtime of graph_lasso

* fixed line length overrun

* added comment explaining the change

* changed explanation comment
@stevendbrown stevendbrown deleted the graph_lasso_runtime branch May 29, 2018 13:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants