-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Reduce runtime of graph_lasso #9858
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] Reduce runtime of graph_lasso #9858
Conversation
LGTM
|
Hi @TomDLT, I think the PEP8 error is now resolved; the tests looked OK on my screen. I've added said comment. It took me a little while to convince myself that this was true, but this short script can also verify it for anyone who is interested: import numpy as np
t = np.random.randint(1, 100, size=(100,100))
indices = np.arange(t.shape[0])
subt = np.ascontiguousarray(t[1:, 1:])
for idx in indices:
if idx > 0:
subt[idx-1] = t[idx-1][indices != idx]
subt[:, idx-1] = t[:, idx-1][indices != idx]
else:
subt[:] = t[1:, 1:]
assert np.all(subt == t[indices != idx].T[indices != idx].T) |
sklearn/covariance/graph_lasso_.py
Outdated
for i in range(max_iter): | ||
for idx in range(n_features): | ||
sub_covariance = np.ascontiguousarray( | ||
covariance_[indices != idx].T[indices != idx]) | ||
# for each idx, the sub_covariance matrix is the covariance_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this comment is good but a bit too long.
I would have gone with something less verbose, such as:
# To keep the contiguous matrix `sub_covariance` equal to
# covariance_[indices != idx].T[indices != idx]
# we only need to update 1 column and 1 line when idx changes
Indeed, for some weird reason I don't see the commit 070fdcf in this page... Anyway, thanks for the script, this is indeed non trivial. |
@TomDLT I've replaced my version of the comment with yours. Thanks. |
All right, I approved this change, let's just wait for a second reviewer |
thx @stevendbrown |
* reduce runtime of graph_lasso * fixed line length overrun * added comment explaining the change * changed explanation comment
* reduce runtime of graph_lasso * fixed line length overrun * added comment explaining the change * changed explanation comment
Reference Issue
None.
What does this implement/fix? Explain your changes.
For a 1288x1288 empirical covariance matrix,
sklearn.covariance.graph_lasso
currently spends 70-80% of it's runtime creatingsub_covariance
on my computer. This change reduces that to ~3.5% of the function's runtime, andnp.allclose(...)
comparing both of the arrays in the result tuple returnTrue
for each. Iterations have the same losses & dual gaps, and calls tograph_lasso
end after the same number of iterations, so I don't see a functional change with my data.Any other comments?
Just noticed that this may interact with #4787. My dataset is quite dense, so the runtime of
graph_lasso
based on that PR is almost double that of the original function.