-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] FIX segmentation fault on memory mapped contiguous memoryview #21654
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] FIX segmentation fault on memory mapped contiguous memoryview #21654
Conversation
This confirms it: The combination of Ubuntu 1804 (or some particular package combination it the CI) and joblib memory mapped read-only arrays passed to contiguous Cython memoryviews (e.g. I'm very sorry to be the one who somehow introduced this bug in scikit-learn. @jjerphan @TomDLT @thomasjpfan @ogrisel @jeremiedbb @lesteve @da-woods friendly ping |
With |
Can I propose a slightly modified version of from cpython.ref cimport Py_INCREF
... # other cimports as before
cdef class ReadonlyArrayWrapper:
cdef object wraps
def __init__(self, wraps):
self.wraps = wraps
def __getbuffer__(self, Py_buffer *buffer, int flags):
request_for_writeable = False
if flags & PyBUF_WRITABLE:
flags ^= PyBUF_WRITABLE
request_for_writeable = True
PyObject_GetBuffer(self.wraps, buffer, flags)
if request_for_writeable:
# The following is a lie when self.wraps is readonly!
buffer.readonly = False
buffer.obj = self
def __releasebuffer__(self, Py_buffer *buffer):
# restore the state when the buffer was created
Py_INCREF(self) # because reassigning buffer.obj decrefs self, and
# the specification of __releasebuffer__ ways we shouldn't do that
buffer.obj = self.wraps
buffer.readonly = True
PyBuffer_Release(buffer) Essentially what I'm doing is
My thought is that we should give back exactly the same buffer as we got - maybe the mmap is keeping a cache of buffers that it's previously provided, and returning a modified buffer is messing with the cache. I don't quite know how the contiguousness fits with that. To re-iterate - this is an untested theory but it seems worth a try! (I've given the code a quick test with Numpy and it seems to work, but I don't have joblib installed and have never used it so can't easily test it there) |
This reverts commit d9f351e.
@da-woods Unfortunately, this does not fix it. It still gives |
That's a shame. Not a huge surprise since it was a wild guess though. I've tried to reproduce the test locally (in a cutdown version independent from scikit-learn) and I've failed to reproduce it. My next suggestion for things to test: what happens if you create writeable mmap backed data, and skip creating the array wrapper. Something like: @pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64])
def test_contig_mmapped(dtype):
x = np.arange(10).astype(dtype)
sum_origin = _test_sum(x)
x_mmap = create_memmap_backed_data(x, mmap_mode="w")
sum_mmap = _test_sum(x_readonly)
assert sum_mmap== pytest.approx(sum_origin, rel=1e-11) That might test you if the issue is creating a contiguous memoryview of (possibly misaligned) mmap-backed data, or the |
Looks like the crash happens in I wonder if this is something that Cython should error/warn about in it's memoryview initialization? I'll create an issue there to suggest it (although that doesn't help you fix it here of course...) |
60960ca
to
c2bbd1d
Compare
c2bbd1d
to
6ea9106
Compare
@da-woods 6ea9106 is confirmation that It's really nice working with you. I hope the next time it's about some uplifting topic and not segfault hunting:smile: Some details on CI system config and compiler options
|
Your help is very valuable, @da-woods: thank you.
I think it's fine, @lorentzenchr: this is really a problem in a sole configuration and I also have approved this change. I do not have time to help with this PR because I have more urgent tasks to do, but I will follow it in the meantime and review it then. |
Would be nice to open a dedicated issue :) |
there's a dedicated PR :) #21662 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 for merging this PR. The new test_memmap_on_contiguous_data
test makes it easier to try to reproduce a minimal version of the problem found in #20567 just by setting aligned=False
.
@da-woods any idea if memory aligned data is a general requirement for doing for loops on C-contiguous arrays? It might be related to SIMD vectorization that indeed requires memory aligned but I thought a C-compiler would be clever and generate loopy code that would start the loop with non-aligned arithmetic operations with scalar instructions and then proceed with vector instructions on memory aligned array chunks. |
Not that I know of. I had a look at the C code it generates and I don't see why it'd end up with different alignment requirements than the non-contiguous case (it's obviously a bit easier to optimize since it knows the strides, but it's simple pointer arithmetic in both cases) I failed to reproduce it on my PC (including when using all the sse flags that looked to be set on the pipeline). It might be interesting to get the compiled module from the tests that crash to try it locally with a debugger and/or to look at the disassembly. But I don't know if that's particularly easy. |
I launched a wheel build in #21677:
|
I am not sure whether or not working with unaligned memory mapped arrays from Cython code should always work or not but given that this is causing hard to debug problems also on pytorch (pytorch/pytorch#35151), maybe it's best to try to re-priorize a fix for joblib/joblib#563 upstream in joblib by padding the generated pickle files to make sure that the loaded arrays are always aligned (based on their dtypes). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for digging in this issue, @lorentzenchr
Should TODO
note be added to think of removing those fixtures once joblib/joblib#563 is fixed?
This review proposes to add such notes and comes with a remark regarding buffer flags.
def _create_memmap_backed_data(data): | ||
return create_memmap_backed_data( | ||
data, mmap_mode="r", return_folder=False, aligned=True | ||
) | ||
|
||
|
||
@pytest.mark.parametrize("readonly", [_readonly_array_copy, _create_memmap_backed_data]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick. Note that this change might not be black
-compliant.
def _create_memmap_backed_data(data): | |
return create_memmap_backed_data( | |
data, mmap_mode="r", return_folder=False, aligned=True | |
) | |
@pytest.mark.parametrize("readonly", [_readonly_array_copy, _create_memmap_backed_data]) | |
# TODO: remove this fixture and its associated parametrization | |
# once https://github.com/joblib/joblib/issues/563 is fixed. | |
def _create_aligned_memmap_backed_data(data): | |
return create_memmap_backed_data( | |
data, mmap_mode="r", return_folder=False, aligned=True | |
) | |
@pytest.mark.parametrize("readonly", [_readonly_array_copy, _create_aligned_memmap_backed_data]) |
|
||
@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64]) | ||
def test_memmap_on_contiguous_data(dtype): | ||
"""Test memory mapped array on contigous memoryview.""" | ||
x = np.arange(10).astype(dtype) | ||
assert x.flags["C_CONTIGUOUS"] | ||
assert x.flags["ALIGNED"] | ||
|
||
# _test_sum consumes contiguous arrays | ||
# def _test_sum(NUM_TYPES[::1] x): | ||
sum_origin = _test_sum(x) | ||
|
||
# now on memory mapped data | ||
# aligned=True so avoid https://github.com/joblib/joblib/issues/563 | ||
# without alignment, this can produce segmentation faults, see | ||
# https://github.com/scikit-learn/scikit-learn/pull/21654 | ||
x_mmap = create_memmap_backed_data(x, mmap_mode="r+", aligned=True) | ||
sum_mmap = _test_sum(x_mmap) | ||
assert sum_mmap == pytest.approx(sum_origin, rel=1e-11) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64]) | |
def test_memmap_on_contiguous_data(dtype): | |
"""Test memory mapped array on contigous memoryview.""" | |
x = np.arange(10).astype(dtype) | |
assert x.flags["C_CONTIGUOUS"] | |
assert x.flags["ALIGNED"] | |
# _test_sum consumes contiguous arrays | |
# def _test_sum(NUM_TYPES[::1] x): | |
sum_origin = _test_sum(x) | |
# now on memory mapped data | |
# aligned=True so avoid https://github.com/joblib/joblib/issues/563 | |
# without alignment, this can produce segmentation faults, see | |
# https://github.com/scikit-learn/scikit-learn/pull/21654 | |
x_mmap = create_memmap_backed_data(x, mmap_mode="r+", aligned=True) | |
sum_mmap = _test_sum(x_mmap) | |
assert sum_mmap == pytest.approx(sum_origin, rel=1e-11) | |
# TODO: remove once https://github.com/joblib/joblib/issues/563 is fixed. | |
@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64]) | |
def test_memmap_on_contiguous_data(dtype): | |
"""Test memory mapped array on contigous memoryview.""" | |
x = np.arange(10).astype(dtype) | |
assert x.flags["C_CONTIGUOUS"] | |
assert x.flags["ALIGNED"] | |
# _test_sum consumes contiguous arrays | |
# def _test_sum(NUM_TYPES[::1] x): | |
sum_origin = _test_sum(x) | |
# now on memory mapped data | |
# aligned=True so avoid https://github.com/joblib/joblib/issues/563 | |
# without alignment, this can produce segmentation faults, see | |
# https://github.com/scikit-learn/scikit-learn/pull/21654 | |
x_mmap = create_memmap_backed_data(x, mmap_mode="r+", aligned=True) | |
sum_mmap = _test_sum(x_mmap) | |
assert sum_mmap == pytest.approx(sum_origin, rel=1e-11) |
if aligned: | ||
if isinstance(data, np.ndarray) and data.flags.aligned: | ||
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html | ||
filename = op.join(temp_folder, "data.dat") | ||
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape) | ||
fp[:] = data[:] # write data to memmap array | ||
fp.flush() | ||
memmap_backed_data = np.memmap( | ||
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape | ||
) | ||
else: | ||
raise ValueError("If aligned=True, input must be a single numpy array.") | ||
else: | ||
filename = op.join(temp_folder, "data.pkl") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if aligned: | |
if isinstance(data, np.ndarray) and data.flags.aligned: | |
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html | |
filename = op.join(temp_folder, "data.dat") | |
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape) | |
fp[:] = data[:] # write data to memmap array | |
fp.flush() | |
memmap_backed_data = np.memmap( | |
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape | |
) | |
else: | |
raise ValueError("If aligned=True, input must be a single numpy array.") | |
else: | |
filename = op.join(temp_folder, "data.pkl") | |
# TODO: remove the aligned case once https://github.com/joblib/joblib/issues/563 | |
# is fixed. | |
if aligned: | |
if isinstance(data, np.ndarray) and data.flags.aligned: | |
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html | |
filename = op.join(temp_folder, "data.dat") | |
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape) | |
fp[:] = data[:] # write data to memmap array | |
fp.flush() | |
memmap_backed_data = np.memmap( | |
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape | |
) | |
else: | |
raise ValueError("If aligned=True, input must be a single numpy array.") | |
else: | |
filename = op.join(temp_folder, "data.pkl") |
aligned : bool, default=False | ||
If True, if input is a single numpy array and if the input array is aligned, | ||
the memory mapped array will also be aligned. This is a workaround for | ||
https://github.com/joblib/joblib/issues/563. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aligned : bool, default=False | |
If True, if input is a single numpy array and if the input array is aligned, | |
the memory mapped array will also be aligned. This is a workaround for | |
https://github.com/joblib/joblib/issues/563. | |
aligned : bool, default=False | |
If True, if input is a single numpy array and if the input array is aligned, | |
the memory mapped array will also be aligned. This is a workaround for | |
https://github.com/joblib/joblib/issues/563 . |
@@ -680,30 +681,59 @@ def test_tempmemmap(monkeypatch): | |||
assert registration_counter.nb_calls == 2 | |||
|
|||
|
|||
def test_create_memmap_backed_data(monkeypatch): | |||
@pytest.mark.parametrize("aligned", [False, True]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pytest.mark.parametrize("aligned", [False, True]) | |
# TODO: remove this test parametrization on aligned once | |
# https://github.com/joblib/joblib/issues/563 is fixed. | |
@pytest.mark.parametrize("aligned", [False, True]) |
@lorentzenchr what do you think of @jjerphan's suggestions above? I would like to merge this PR asap because it would be useful to have a low-maintenance fix for some randomly segfaulting tests on NuSVC on our CI (#21702 (review) about fixing #21361). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To move forward, I propose leaving my personal unimportant nitpick. LGTM!
I think it would be worth adding back the suggested references to joblib/joblib#563 as part of #21702. |
Shall I open a dedicated PR for those comments? Sorry, that I did not have the time to respond that fast. |
…ryview (scikit-learn#21654)" This reverts commit 3f717cc.
…iguous memoryview (scikit-learn#21654)" This reverts commit 3f717cc.
I'm getting this issue on a newly fresh conda environnement, the default python version is |
@bitsnaps Could you update to the latest release? |
I'm not sure if I can do that without breaking anything! when running |
The latest release on PyPI, conda-forge and the defaults channel of Anaconda is 1.2.2 and we might have the solve the issue there. You can try on a new minimal environment to check that we actually solve the problem that you pointed out in this version. Otherwise, you can open a new issue with the example and information regarding your environment. |
There is something weird here, I tried to create a new env with exactly the same packages/versions, and everything is working fine, even with the old version scikit (v |
What does this implement/fix? Explain your changes.
Hunting for the segmentation fault in #20567 (comment).
Edit: The fix is to not use joblib for creating readonly (memory mapped) arrays, but to use
np.memmap
instead, at least where aligned arrays are required.Any other comments?
A segmentation fault happens on
Ubuntu_Bionic py37_conda_forge_openblas_ubuntu_1804
when a memory mapped readonly array (via joblib) is passed to a contiguous memoryview in Cython.