Skip to content

[MRG] FIX segmentation fault on memory mapped contiguous memoryview #21654

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 18, 2021

Conversation

lorentzenchr
Copy link
Member

@lorentzenchr lorentzenchr commented Nov 13, 2021

What does this implement/fix? Explain your changes.

Hunting for the segmentation fault in #20567 (comment).

Edit: The fix is to not use joblib for creating readonly (memory mapped) arrays, but to use np.memmap instead, at least where aligned arrays are required.

Any other comments?

A segmentation fault happens on Ubuntu_Bionic py37_conda_forge_openblas_ubuntu_1804 when a memory mapped readonly array (via joblib) is passed to a contiguous memoryview in Cython.

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Nov 13, 2021

This confirms it: The combination of Ubuntu 1804 (or some particular package combination it the CI) and joblib memory mapped read-only arrays passed to contiguous Cython memoryviews (e.g. double x[::1]) and the ReadonlyArrayWrapper from #20903 causes a segmentation fault, see CI buildid=34976.

I'm very sorry to be the one who somehow introduced this bug in scikit-learn.

@jjerphan @TomDLT @thomasjpfan @ogrisel @jeremiedbb @lesteve @da-woods friendly ping

@lorentzenchr lorentzenchr changed the title TST use contiguous memoryview TST ReadonlyArrayWrapper with mmap contiguous memoryview Nov 13, 2021
@lorentzenchr
Copy link
Member Author

With boundscheck=True, test_readonly_array_wrapper passes successfully, but a lot of kmeans tests fail for various reasons.

@da-woods
Copy link

da-woods commented Nov 14, 2021

Can I propose a slightly modified version of ReadonlyArrayWrapper? This is just for you to test - I'm taking a wild guess at the problem

from cpython.ref cimport Py_INCREF
...  # other cimports as before

cdef class ReadonlyArrayWrapper:
    cdef object wraps

    def __init__(self, wraps):
        self.wraps = wraps

    def __getbuffer__(self, Py_buffer *buffer, int flags):
        request_for_writeable = False
        if flags & PyBUF_WRITABLE:
            flags ^= PyBUF_WRITABLE
            request_for_writeable = True
        PyObject_GetBuffer(self.wraps, buffer, flags)
        if request_for_writeable:
            # The following is a lie when self.wraps is readonly!
            buffer.readonly = False
            buffer.obj = self

    def __releasebuffer__(self, Py_buffer *buffer):
        # restore the state when the buffer was created
        Py_INCREF(self)  # because reassigning buffer.obj decrefs self, and
                        # the specification of __releasebuffer__ ways we shouldn't do that
        buffer.obj = self.wraps
        buffer.readonly = True
        PyBuffer_Release(buffer)

Essentially what I'm doing is

  1. Reassigning the buffer obj to be self (so that when the buffer is released it calls this class rather than wraps). This is a switch from the "rexport" to "redirect" scheme in https://docs.python.org/3/c-api/typeobj.html#c.PyBufferProcs.bf_getbuffer
  2. In the __releasebuffer__ function I'm restoring the state to how it was before we started messing with it.

My thought is that we should give back exactly the same buffer as we got - maybe the mmap is keeping a cache of buffers that it's previously provided, and returning a modified buffer is messing with the cache. I don't quite know how the contiguousness fits with that.

To re-iterate - this is an untested theory but it seems worth a try! (I've given the code a quick test with Numpy and it seems to work, but I don't have joblib installed and have never used it so can't easily test it there)

@lorentzenchr
Copy link
Member Author

@da-woods Unfortunately, this does not fix it. It still gives Fatal Python error: Segmentation fault. It's hard to judge which component is broken: ReadonlyArrayWrapper, joblib, Cython or some other software component of the Ubuntu 1804 CI setup.

@lorentzenchr lorentzenchr changed the title TST ReadonlyArrayWrapper with mmap contiguous memoryview [WIP] TST ReadonlyArrayWrapper with mmap contiguous memoryview Nov 14, 2021
@da-woods
Copy link

da-woods commented Nov 14, 2021

Unfortunately, this does not fix it.

That's a shame. Not a huge surprise since it was a wild guess though.


I've tried to reproduce the test locally (in a cutdown version independent from scikit-learn) and I've failed to reproduce it.

My next suggestion for things to test: what happens if you create writeable mmap backed data, and skip creating the array wrapper. Something like:

@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64])
def test_contig_mmapped(dtype):
    x = np.arange(10).astype(dtype)
    sum_origin = _test_sum(x)
    x_mmap = create_memmap_backed_data(x, mmap_mode="w")
    sum_mmap = _test_sum(x_readonly)
    assert sum_mmap== pytest.approx(sum_origin, rel=1e-11)

That might test you if the issue is creating a contiguous memoryview of (possibly misaligned) mmap-backed data, or the ReadonlyArrayWrapper. (It's possible that this is tested elsewhere of course and I just don't know about it)

@da-woods
Copy link

Looks like the crash happens in test_contig_mmapped and so isn't related to the readonly wrapper. That suggests to me that it's just related to the alignment issue discussed in the other thread (although I have no idea what exact combination of compiler and CFLAGS cause it).

I wonder if this is something that Cython should error/warn about in it's memoryview initialization? I'll create an issue there to suggest it (although that doesn't help you fix it here of course...)

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Nov 14, 2021

@da-woods To be sure, I commented out the complete test_readonly_array_wrapper in 6ea9106. Let's see if there is still a segfault.

@lorentzenchr
Copy link
Member Author

lorentzenchr commented Nov 14, 2021

@da-woods 6ea9106 is confirmation that test_contig_mmapped throws a segfault. So it's not caused by ReadonlyWrapper😅

It's really nice working with you. I hope the next time it's about some uplifting topic and not segfault hunting:smile:

Some details on CI system config and compiler options

Starting: Test Library
==============================================================================
Task         : Command line
Description  : Run a command line script using Bash on Linux and macOS and cmd.exe on Windows
Version      : 2.182.0
Author       : Microsoft Corporation
Help         : https://docs.microsoft.com/azure/devops/pipelines/tasks/utility/command-line
==============================================================================
Generating script.
Script contents:
exec build_tools/azure/test_script.sh
========================== Starting Command Output ===========================
/bin/bash --noprofile --norc /home/vsts/work/_temp/dd2f19b3-8708-47ff-bb67-5d093e3c559d.sh

System:
    python: 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:08:53)  [GCC 9.4.0]
executable: /usr/share/miniconda/envs/testvenv/bin/python
   machine: Linux-5.4.0-1062-azure-x86_64-with-debian-buster-sid

Python dependencies:
          pip: 21.3.1
   setuptools: 58.5.3
      sklearn: 1.1.dev0
        numpy: 1.21.4
        scipy: 1.7.2
       Cython: 0.29.24
       pandas: 1.3.4
   matplotlib: 3.4.3
       joblib: 1.1.0
threadpoolctl: 3.0.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libomp
       filepath: /usr/share/miniconda/envs/testvenv/lib/libomp.so
        version: None
    num_threads: 2

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /usr/share/miniconda/envs/testvenv/lib/libopenblasp-r0.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 2

# packages in environment at /usr/share/miniconda/envs/testvenv:
#
# Name                    Version                   Build  Channel
_libgcc_mutex             0.1                 conda_forge    conda-forge
numpy                     1.21.4           py37h31617e3_0    conda-forge
olefile                   0.46               pyh9f0ad1d_1    conda-forge
openblas                  0.3.18          pthreads_h4748800_0    conda-forge
openjpeg                  2.4.0                hb52868f_1    conda-forge
openssl                   1.1.1l               h7f98852_0    conda-forge
packaging                 21.2                     pypi_0    pypi
pandas                    1.3.4            py37he8f5f7f_1    conda-forge
pcre                      8.45                 h9c3ff4c_0    conda-forge
pillow                    8.4.0            py37h0f21c89_0    conda-forge
pip                       21.3.1             pyhd8ed1ab_0    conda-forge
pluggy                    1.0.0                    pypi_0    pypi
pthread-stubs             0.4               h36c2ea0_1001    conda-forge
py                        1.11.0                   pypi_0    pypi
pyamg                     4.1.0            py37h796e4cb_1    conda-forge
pyparsing                 2.4.7                    pypi_0    pypi
pyqt                      5.12.3           py37h89c1867_8    conda-forge
pyqt-impl                 5.12.3           py37hac37412_8    conda-forge
pyqt5-sip                 4.19.18          py37hcd2ae1e_8    conda-forge
pyqtchart                 5.12             py37he336c9b_8    conda-forge
pyqtwebengine             5.12.1           py37he336c9b_8    conda-forge
pytest                    6.2.5                    pypi_0    pypi
pytest-forked             1.3.0                    pypi_0    pypi
pytest-xdist              2.4.0                    pypi_0    pypi
python                    3.7.12          hb7a2778_100_cpython    conda-forge
python-dateutil           2.8.2              pyhd8ed1ab_0    conda-forge
python_abi                3.7                     2_cp37m    conda-forge
pytz                      2021.3             pyhd8ed1ab_0    conda-forge
qt                        5.12.9               hda022c4_4    conda-forge
readline                  8.1                  h46c0cb4_0    conda-forge
scikit-learn              1.1.dev0                  dev_0    <develop>
scipy                     1.7.2            py37hf2a6cf1_0    conda-forge
setuptools                58.5.3           py37h89c1867_0    conda-forge
six                       1.16.0             pyh6c4a22f_0    conda-forge
sqlite                    3.36.0               h9cd32fc_2    conda-forge
threadpoolctl             3.0.0                    pypi_0    pypi
tk                        8.6.11               h27826a3_1    conda-forge
toml                      0.10.2                   pypi_0    pypi
tornado                   6.1              py37h5e8e339_2    conda-forge
typing-extensions         3.10.0.2                 pypi_0    pypi
wheel                     0.37.0             pyhd8ed1ab_1    conda-forge
xorg-libxau               1.0.9                h7f98852_0    conda-forge
xorg-libxdmcp             1.1.3                h7f98852_0    conda-forge
xz                        5.2.5                h516909a_1    conda-forge
zipp                      3.6.0                    pypi_0    pypi
zlib                      1.2.11            h36c2ea0_1013    conda-forge
zstd                      1.5.0                ha95c52a_0    conda-forge
building 'sklearn.utils._readonly_array_wrapper' extension
compiling C sources
C compiler: gcc -pthread -B /usr/share/miniconda/envs/testvenv/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC

compile options: '-I/usr/share/miniconda/envs/testvenv/lib/python3.7/site-packages/numpy/core/include -Ibuild/src.linux-x86_64-3.7/numpy/distutils/include -I/usr/share/miniconda/envs/testvenv/include/python3.7m -I/usr/share/miniconda/envs/testvenv/include/python3.7m -c'
extra options: '-fopenmp -msse -msse2 -msse3'
In file included from /usr/share/miniconda/envs/testvenv/lib/python3.7/site-packages/numpy/core/include/numpy/ndarraytypes.h:1969:0,
                 from /usr/share/miniconda/envs/testvenv/lib/python3.7/site-packages/numpy/core/include/numpy/ndarrayobject.h:12,
                 from /usr/share/miniconda/envs/testvenv/lib/python3.7/site-packages/numpy/core/include/numpy/arrayobject.h:4,
                 from sklearn/utils/_weight_vector.c:645:
/usr/share/miniconda/envs/testvenv/lib/python3.7/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: #warning "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-Wcpp]
 #warning "Using deprecated NumPy API, disable it with " \
  ^~~~~~~
gcc -pthread -shared -B /usr/share/miniconda/envs/testvenv/compiler_compat -L/usr/share/miniconda/envs/testvenv/lib -Wl,-rpath=/usr/share/miniconda/envs/testvenv/lib -Wl,--no-as-needed -Wl,--sysroot=/ -Wl,--sysroot=/ build/temp.linux-x86_64-3.7/sklearn/utils/_weight_vector.o -Lbuild/temp.linux-x86_64-3.7 -lm -o sklearn/utils/_weight_vector.cpython-37m-x86_64-linux-gnu.so -fopenmp
gcc: sklearn/utils/_readonly_array_wrapper.c

@lorentzenchr lorentzenchr changed the title [WIP] TST ReadonlyArrayWrapper with mmap contiguous memoryview [WIP] DEBUG segmentation fault on memory mapped contiguous memoryview Nov 14, 2021
@jjerphan
Copy link
Member

Your help is very valuable, @da-woods: thank you.

I'm very sorry to be the one who somehow introduced this bug in scikit-learn.

I think it's fine, @lorentzenchr: this is really a problem in a sole configuration and I also have approved this change.

I do not have time to help with this PR because I have more urgent tasks to do, but I will follow it in the meantime and review it then.

@ogrisel
Copy link
Member

ogrisel commented Nov 15, 2021

With boundscheck=True, test_readonly_array_wrapper passes successfully, but a lot of kmeans tests fail for various reasons.

Would be nice to open a dedicated issue :)

@jeremiedbb
Copy link
Member

Would be nice to open a dedicated issue :)

there's a dedicated PR :) #21662

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for merging this PR. The new test_memmap_on_contiguous_data test makes it easier to try to reproduce a minimal version of the problem found in #20567 just by setting aligned=False.

@ogrisel
Copy link
Member

ogrisel commented Nov 15, 2021

@da-woods any idea if memory aligned data is a general requirement for doing for loops on C-contiguous arrays? It might be related to SIMD vectorization that indeed requires memory aligned but I thought a C-compiler would be clever and generate loopy code that would start the loop with non-aligned arithmetic operations with scalar instructions and then proceed with vector instructions on memory aligned array chunks.

@da-woods
Copy link

@da-woods any idea if memory aligned data is a general requirement for doing for loops on C-contiguous arrays? It might be related to SIMD vectorization that indeed require memory aligned but I thought C-compiler would be clever and generate loopy code that would start the loop with non-aligned arithmetic operations with scalar instructions and then proceed with vector instructions once safe.

Not that I know of. I had a look at the C code it generates and I don't see why it'd end up with different alignment requirements than the non-contiguous case (it's obviously a bit easier to optimize since it knows the strides, but it's simple pointer arithmetic in both cases)

I failed to reproduce it on my PC (including when using all the sse flags that looked to be set on the pipeline).

It might be interesting to get the compiled module from the tests that crash to try it locally with a debugger and/or to look at the disassembly. But I don't know if that's particularly easy.

@ogrisel
Copy link
Member

ogrisel commented Nov 15, 2021

I launched a wheel build in #21677:

  • to see if the compiler which we use to generate the wheel is impacted by this problem;
  • maybe to get the resulting wheel on a platform that fails (I think they are archived as build build artifacts but maybe not in case of failure...)

@ogrisel
Copy link
Member

ogrisel commented Nov 16, 2021

I am not sure whether or not working with unaligned memory mapped arrays from Cython code should always work or not but given that this is causing hard to debug problems also on pytorch (pytorch/pytorch#35151), maybe it's best to try to re-priorize a fix for joblib/joblib#563 upstream in joblib by padding the generated pickle files to make sure that the loaded arrays are always aligned (based on their dtypes).

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for digging in this issue, @lorentzenchr

Should TODO note be added to think of removing those fixtures once joblib/joblib#563 is fixed?

This review proposes to add such notes and comes with a remark regarding buffer flags.

Comment on lines +16 to +22
def _create_memmap_backed_data(data):
return create_memmap_backed_data(
data, mmap_mode="r", return_folder=False, aligned=True
)


@pytest.mark.parametrize("readonly", [_readonly_array_copy, _create_memmap_backed_data])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick. Note that this change might not be black-compliant.

Suggested change
def _create_memmap_backed_data(data):
return create_memmap_backed_data(
data, mmap_mode="r", return_folder=False, aligned=True
)
@pytest.mark.parametrize("readonly", [_readonly_array_copy, _create_memmap_backed_data])
# TODO: remove this fixture and its associated parametrization
# once https://github.com/joblib/joblib/issues/563 is fixed.
def _create_aligned_memmap_backed_data(data):
return create_memmap_backed_data(
data, mmap_mode="r", return_folder=False, aligned=True
)
@pytest.mark.parametrize("readonly", [_readonly_array_copy, _create_aligned_memmap_backed_data])

Comment on lines +718 to +736

@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64])
def test_memmap_on_contiguous_data(dtype):
"""Test memory mapped array on contigous memoryview."""
x = np.arange(10).astype(dtype)
assert x.flags["C_CONTIGUOUS"]
assert x.flags["ALIGNED"]

# _test_sum consumes contiguous arrays
# def _test_sum(NUM_TYPES[::1] x):
sum_origin = _test_sum(x)

# now on memory mapped data
# aligned=True so avoid https://github.com/joblib/joblib/issues/563
# without alignment, this can produce segmentation faults, see
# https://github.com/scikit-learn/scikit-learn/pull/21654
x_mmap = create_memmap_backed_data(x, mmap_mode="r+", aligned=True)
sum_mmap = _test_sum(x_mmap)
assert sum_mmap == pytest.approx(sum_origin, rel=1e-11)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64])
def test_memmap_on_contiguous_data(dtype):
"""Test memory mapped array on contigous memoryview."""
x = np.arange(10).astype(dtype)
assert x.flags["C_CONTIGUOUS"]
assert x.flags["ALIGNED"]
# _test_sum consumes contiguous arrays
# def _test_sum(NUM_TYPES[::1] x):
sum_origin = _test_sum(x)
# now on memory mapped data
# aligned=True so avoid https://github.com/joblib/joblib/issues/563
# without alignment, this can produce segmentation faults, see
# https://github.com/scikit-learn/scikit-learn/pull/21654
x_mmap = create_memmap_backed_data(x, mmap_mode="r+", aligned=True)
sum_mmap = _test_sum(x_mmap)
assert sum_mmap == pytest.approx(sum_origin, rel=1e-11)
# TODO: remove once https://github.com/joblib/joblib/issues/563 is fixed.
@pytest.mark.parametrize("dtype", [np.float32, np.float64, np.int32, np.int64])
def test_memmap_on_contiguous_data(dtype):
"""Test memory mapped array on contigous memoryview."""
x = np.arange(10).astype(dtype)
assert x.flags["C_CONTIGUOUS"]
assert x.flags["ALIGNED"]
# _test_sum consumes contiguous arrays
# def _test_sum(NUM_TYPES[::1] x):
sum_origin = _test_sum(x)
# now on memory mapped data
# aligned=True so avoid https://github.com/joblib/joblib/issues/563
# without alignment, this can produce segmentation faults, see
# https://github.com/scikit-learn/scikit-learn/pull/21654
x_mmap = create_memmap_backed_data(x, mmap_mode="r+", aligned=True)
sum_mmap = _test_sum(x_mmap)
assert sum_mmap == pytest.approx(sum_origin, rel=1e-11)

Comment on lines +537 to +550
if aligned:
if isinstance(data, np.ndarray) and data.flags.aligned:
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html
filename = op.join(temp_folder, "data.dat")
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape)
fp[:] = data[:] # write data to memmap array
fp.flush()
memmap_backed_data = np.memmap(
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape
)
else:
raise ValueError("If aligned=True, input must be a single numpy array.")
else:
filename = op.join(temp_folder, "data.pkl")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if aligned:
if isinstance(data, np.ndarray) and data.flags.aligned:
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html
filename = op.join(temp_folder, "data.dat")
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape)
fp[:] = data[:] # write data to memmap array
fp.flush()
memmap_backed_data = np.memmap(
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape
)
else:
raise ValueError("If aligned=True, input must be a single numpy array.")
else:
filename = op.join(temp_folder, "data.pkl")
# TODO: remove the aligned case once https://github.com/joblib/joblib/issues/563
# is fixed.
if aligned:
if isinstance(data, np.ndarray) and data.flags.aligned:
# https://numpy.org/doc/stable/reference/generated/numpy.memmap.html
filename = op.join(temp_folder, "data.dat")
fp = np.memmap(filename, dtype=data.dtype, mode="w+", shape=data.shape)
fp[:] = data[:] # write data to memmap array
fp.flush()
memmap_backed_data = np.memmap(
filename, dtype=data.dtype, mode=mmap_mode, shape=data.shape
)
else:
raise ValueError("If aligned=True, input must be a single numpy array.")
else:
filename = op.join(temp_folder, "data.pkl")

Comment on lines +530 to +533
aligned : bool, default=False
If True, if input is a single numpy array and if the input array is aligned,
the memory mapped array will also be aligned. This is a workaround for
https://github.com/joblib/joblib/issues/563.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aligned : bool, default=False
If True, if input is a single numpy array and if the input array is aligned,
the memory mapped array will also be aligned. This is a workaround for
https://github.com/joblib/joblib/issues/563.
aligned : bool, default=False
If True, if input is a single numpy array and if the input array is aligned,
the memory mapped array will also be aligned. This is a workaround for
https://github.com/joblib/joblib/issues/563 .

@@ -680,30 +681,59 @@ def test_tempmemmap(monkeypatch):
assert registration_counter.nb_calls == 2


def test_create_memmap_backed_data(monkeypatch):
@pytest.mark.parametrize("aligned", [False, True])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
@pytest.mark.parametrize("aligned", [False, True])
# TODO: remove this test parametrization on aligned once
# https://github.com/joblib/joblib/issues/563 is fixed.
@pytest.mark.parametrize("aligned", [False, True])

@ogrisel
Copy link
Member

ogrisel commented Nov 18, 2021

@lorentzenchr what do you think of @jjerphan's suggestions above? I would like to merge this PR asap because it would be useful to have a low-maintenance fix for some randomly segfaulting tests on NuSVC on our CI (#21702 (review) about fixing #21361).

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To move forward, I propose leaving my personal unimportant nitpick. LGTM!

@jjerphan jjerphan merged commit 3f717cc into scikit-learn:main Nov 18, 2021
@ogrisel
Copy link
Member

ogrisel commented Nov 18, 2021

I propose leaving my personal unimportant nitpick. LGTM!

I think it would be worth adding back the suggested references to joblib/joblib#563 as part of #21702.

@lorentzenchr
Copy link
Member Author

Shall I open a dedicated PR for those comments? Sorry, that I did not have the time to respond that fast.

@lorentzenchr lorentzenchr deleted the debug_mmap_segfault branch November 19, 2021 08:50
lesteve added a commit to lesteve/scikit-learn that referenced this pull request Feb 25, 2022
lesteve added a commit to lesteve/scikit-learn that referenced this pull request Feb 25, 2022
@bitsnaps
Copy link

bitsnaps commented May 2, 2023

I'm getting this issue on a newly fresh conda environnement, the default python version is 3.11 and I tried new env with python v3.7, both causes that issue, the guilty line is: KMeans.fit(X) on a simple KMeans example which works in the previous version.
OS: MacOS 10.14.6
conda: 23.3.1
scikit-learn: 1.2.1
The kernel keeps dying in Jupyter, and the execution on terminal ends with error: Segmentation fault: 11, @lorentzenchr any idea?

@glemaitre
Copy link
Member

@bitsnaps Could you update to the latest release?

@bitsnaps
Copy link

bitsnaps commented May 2, 2023

@bitsnaps Could you update to the latest release?

I'm not sure if I can do that without breaking anything! when running conda update --all it says nothing to update, if I force upgrade to the latest version with pip I would probably break something else (like I did before), should I have to try on a new environnement?

@glemaitre
Copy link
Member

The latest release on PyPI, conda-forge and the defaults channel of Anaconda is 1.2.2 and we might have the solve the issue there. You can try on a new minimal environment to check that we actually solve the problem that you pointed out in this version.

Otherwise, you can open a new issue with the example and information regarding your environment.

@bitsnaps
Copy link

bitsnaps commented May 3, 2023

There is something weird here, I tried to create a new env with exactly the same packages/versions, and everything is working fine, even with the old version scikit (v1.0.2), the issue seems to be related to my environnement. Thank you @glemaitre !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants