Skip to content

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Jul 25, 2022

This is a potential fix for #23786 where on some runs the architecture detected by OpenBLAS is Prescott.

To reproduce you can do:

OPENBLAS_CORETYPE=Prescott pytest sklearn/metrics/tests/test_pairwise_distances_reduction.py -k memmap

which yields a segmentation fault on main. The gdb info points towards ddot_k_PRESCOTT as #21361. Also see OpenMathLib/OpenBLAS#3453.

Also it seems better to avoid this issue in a centralized manner in create_memmap_backed_data when OpenBLAS detects a Prescott architecture rather than a specific fix for the common tests as was done before.

gdb segmentation fault info
0x00007ffff41bd67e in ddot_k_PRESCOTT () from /home/local/lesteve/miniconda3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.so.3
(gdb) bt
#0  0x00007ffff41bd67e in ddot_k_PRESCOTT () from /home/local/lesteve/miniconda3/lib/python3.9/site-packages/numpy/core/../../../../libcblas.so.3
#1  0x00007fffd7de7f29 in ddotwrp (ret=0, n=<optimized out>, dx=..., incx=<optimized out>, dy=..., incy=<optimized out>)
    at scipy/linalg/_blas_subroutine_wrappers.f:108
#2  0x00007fffd7dcb2d1 in __pyx_f_5scipy_6linalg_11cython_blas_ddot ()
   from /home/local/lesteve/miniconda3/lib/python3.9/site-packages/scipy/linalg/cython_blas.cpython-39-x86_64-linux-gnu.so
#3  0x00007fffd450ab1a in __pyx_fuse_1__pyx_f_7sklearn_5utils_12_cython_blas__dot ()
   from /home/local/lesteve/dev/scikit-learn/sklearn/utils/_cython_blas.cpython-39-x86_64-linux-gnu.so
#4  0x00007fffd459d6ca in __pyx_f_7sklearn_7metrics_29_pairwise_distances_reduction_5_base__sqeuclidean_row_norms64(__Pyx_memviewslice, long, int) [clone ._omp_fn.0] () from /home/local/lesteve/dev/scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_base.cpython-39-x86_64-linux-gnu.so
#5  0x00007fffd5ef97f0 in gomp_thread_start (xdata=<optimized out>) at ../../../libgomp/team.c:129
#6  0x00007ffff7d6d609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007ffff7b2c133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for looking into this issue. The bug was quite subtle. I'm happy with the fix in this PR. LGTM

@lesteve
Copy link
Member Author

lesteve commented Jul 26, 2022

In my last commit, I switched to require a sequence of arrays rather than an iterable since:

  • a list is used in all cases
  • there was a bug previously with iterable of arrays, the all in the if condition consumes the iterable and so the returned result was always an empty list ...

The code looked like this:

    if isinstance(data, Iterable) and all(
        isinstance(each, np.ndarray) for each in data
    ):
    return [... for each in data]

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @lesteve.

Waiting for @thomasjpfan to merge as the PR have changed since his last review.

@thomasjpfan thomasjpfan merged commit 79c21c5 into scikit-learn:main Jul 26, 2022
@lesteve lesteve deleted the memmap-prescott-work-around branch July 26, 2022 14:38
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants