-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
[MRG+1] FIX consistency of memory layout for linear CD solver #5337
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG+1] FIX consistency of memory layout for linear CD solver #5337
Conversation
BTW, I still get a memory error reported by valgrind when I use openblas from ubuntu 14.04 but I think this is in openblas it-self and it does not seem to cause an error at our level. There might be a bug in openblas but I think this PR fixes the issue reported in #5013. |
Travis revealed that the contiguity / ordering depends on the version of numpy. I am on it. |
Should be fixed. My last commit might reveal a bug in graph lasso with old versions of numpy... |
b503fbb
to
0b49e26
Compare
LGTM @arthurmensch are you ok ? |
I ll review and bench it this evening
|
0b49e26
to
c3a3e69
Compare
check_input : bool, default True | ||
Skip input validation checks, including the Gram matrix when provided | ||
assuming there are handled by the caller when check_input=False. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok so I guess it is okay to present this flag to the user ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think so: check_input
flags are already present in several other functions / methods in the scikit-learn code base.
I see no performance regression on dictionary learning, so 👍 Looking back on if hasattr(precompute, '__array__') and (
fit_intercept and not np.allclose(X_mean, np.zeros(n_features))
or normalize and not np.allclose(X_std, np.ones(n_features))): despite being important for performance. But that is another issue. |
Great, thanks for the reviews merging. I will not update what's new as this issue was not present in 0.16. |
…ayout [MRG+1] FIX consistency of memory layout for linear CD solver
@@ -205,7 +205,8 @@ def graph_lasso(emp_cov, alpha, cov_init=None, mode='cd', tol=1e-4, | |||
d_gap = np.inf | |||
for i in range(max_iter): | |||
for idx in range(n_features): | |||
sub_covariance = covariance_[indices != idx].T[indices != idx] | |||
sub_covariance = np.ascontiguousarray( | |||
covariance_[indices != idx].T[indices != idx]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be
mask = indices != idx
covariance_[np.ix_(mask, mask)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@MechCoder do you want to see if that is better and do a quick PR?
a deterministic regression test would be nice. |
This should fix #5013.
This PR fixes several related consistency issues in the handling of the memory layout in models using the linear coordinate descent solvers (Gram or not).
Also I made the expectation w.r.t. memory layout explicit in the Cython prototypes directly which should prevent re-introducing regressions in the future.
@arthurmensch I would appreciate a review on this as I touched a lot of code you changed when introducing the ability to skip input checks. Especially if you have your benchmark scripts at hand it would be great if you could check that I do not re-introduce unwanted redundant input checks.