Skip to content

[MRG] FIX SparseCoder with readonly parallel mmap #11346

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 23, 2018

Conversation

ogrisel
Copy link
Member

@ogrisel ogrisel commented Jun 22, 2018

Fixes #5956.

I reused a faster version of the non-regression test of #5998 but the fix is more minimal.

I will directly merge if CI is green.

# if they are larger than 1MB. The 4 accounts for float32
# data type
n_samples = int(2e6) // (4 * n_features)
data = np.random.rand(n_samples, n_features).astype(np.float32)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively can just do data.setflags(write=False). That way having a lot of data is not necessary to generate the same problem. Did something similar in PR ( #11342 ) in case it helps.

Copy link
Member Author

@ogrisel ogrisel Jun 23, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, I believe it's not possible to use this as in this particular case it's an internal data structure Xy (inderectly derived from the data) that is being mutated.

We could write a unittest for the private function _gram_omp tough.

@ogrisel
Copy link
Member Author

ogrisel commented Jun 23, 2018

I have added a short unittest based on @jakirkham's suggestion to use readonly flags. I am not sure if we should keep the test_sparse_coder_parallel_mmap.

On the + side, it's a good integration test that actually reflects the original problem from the user's point of view.

On the - side, it's a bit slow to run (2.26s on my laptop) and I don't think we can easily make it run faster while still reproducing the original problem.

@ogrisel ogrisel merged commit 174f4ae into scikit-learn:master Jun 23, 2018
@ogrisel
Copy link
Member Author

ogrisel commented Jun 23, 2018

CI was green everywhere, I merged.

@ogrisel ogrisel deleted the fix-omp-readonly branch June 23, 2018 10:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ValueError: assignment destination is read-only, when paralleling with n_jobs > 1
2 participants