Skip to content

[MRG+1] FIX consistency of memory layout for linear CD solver #5337

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Oct 2, 2015

This should fix #5013.

This PR fixes several related consistency issues in the handling of the memory layout in models using the linear coordinate descent solvers (Gram or not).

Also I made the expectation w.r.t. memory layout explicit in the Cython prototypes directly which should prevent re-introducing regressions in the future.

@arthurmensch I would appreciate a review on this as I touched a lot of code you changed when introducing the ability to skip input checks. Especially if you have your benchmark scripts at hand it would be great if you could check that I do not re-introduce unwanted redundant input checks.

@ogrisel ogrisel added this to the 0.17 milestone Oct 2, 2015
@ogrisel
Copy link
Member Author

ogrisel commented Oct 2, 2015

BTW, I still get a memory error reported by valgrind when I use openblas from ubuntu 14.04 but I think this is in openblas it-self and it does not seem to cause an error at our level. There might be a bug in openblas but I think this PR fixes the issue reported in #5013.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 2, 2015

Travis revealed that the contiguity / ordering depends on the version of numpy. I am on it.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 2, 2015

Should be fixed. My last commit might reveal a bug in graph lasso with old versions of numpy...

@agramfort
Copy link
Member

LGTM

@arthurmensch are you ok ?

@arthurmensch
Copy link
Contributor

I ll review and bench it this evening
On Oct 4, 2015 4:44 PM, "Alexandre Gramfort" notifications@github.com
wrote:

LGTM

@arthurmensch https://github.com/arthurmensch are you ok ?


Reply to this email directly or view it on GitHub
#5337 (comment)
.

@ogrisel ogrisel changed the title [MRG] FIX consistency of memory layout for linear CD solver [MRG+1] FIX consistency of memory layout for linear CD solver Oct 4, 2015
@ogrisel ogrisel force-pushed the fix-coordinate-descent-memory-layout branch from 0b49e26 to c3a3e69 Compare October 4, 2015 16:38
check_input : bool, default True
Skip input validation checks, including the Gram matrix when provided
assuming there are handled by the caller when check_input=False.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok so I guess it is okay to present this flag to the user ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think so: check_input flags are already present in several other functions / methods in the scikit-learn code base.

@arthurmensch
Copy link
Contributor

I see no performance regression on dictionary learning, so 👍

Looking back on _pre_fit, I feel that this line is not very clean

    if hasattr(precompute, '__array__') and (
            fit_intercept and not np.allclose(X_mean, np.zeros(n_features))
            or normalize and not np.allclose(X_std, np.ones(n_features))):

despite being important for performance. But that is another issue.

@ogrisel
Copy link
Member Author

ogrisel commented Oct 5, 2015

Great, thanks for the reviews merging. I will not update what's new as this issue was not present in 0.16.

ogrisel added a commit that referenced this pull request Oct 5, 2015
…ayout

[MRG+1] FIX consistency of memory layout for linear CD solver
@ogrisel ogrisel merged commit 1025982 into scikit-learn:master Oct 5, 2015
@@ -205,7 +205,8 @@ def graph_lasso(emp_cov, alpha, cov_init=None, mode='cd', tol=1e-4,
d_gap = np.inf
for i in range(max_iter):
for idx in range(n_features):
sub_covariance = covariance_[indices != idx].T[indices != idx]
sub_covariance = np.ascontiguousarray(
covariance_[indices != idx].T[indices != idx])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be

mask = indices != idx
covariance_[np.ix_(mask, mask)]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MechCoder do you want to see if that is better and do a quick PR?

@amueller
Copy link
Member

a deterministic regression test would be nice.
Thanks so much for fixing this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Random segfault under windows in sklearn.decomposition.tests.test_sparse_pca.test_fit_transform
5 participants