BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out #28625

larsoner · 2024-03-13T15:36:08Z

In MNE-Python our Windows pip-pre job on Azure has started reliably timing out (and a second example):

mne/preprocessing/tests/test_interpolate.py::test_find_centroid PASSED   [ 38%]
##[error]The Operation will be canceled. The next steps may not contain expected logs.
Fatal Python error: PyThreadState_Get: the function must be called with the GIL held, but the GIL is released (the current Python thread state is NULL)
Python runtime state: initialized

...
Thread 0x000014fc (most recent call first):
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\metrics\_pairwise_distances_reduction\_dispatcher.py", line 278 in compute
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_base.py", line 850 in kneighbors
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_lof.py", line 291 in fit
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\base.py", line 1474 in wrapper
  File "C:\hostedtoolcache\windows\Python\3.11.8\x64\Lib\site-packages\sklearn\neighbors\_lof.py", line 256 in fit_predict
  File "D:\a\1\s\mne\preprocessing\_lof.py", line 89 in find_bad_channels_lof
  File "<decorator-gen-627>", line 12 in find_bad_channels_lof
  File "D:\a\1\s\mne\preprocessing\tests\test_lof.py", line 31 in test_lof
...

Our code just calls the following (and hasn't been changed):

    clf = LocalOutlierFactor(n_neighbors=n_neighbors, metric=metric)
    clf.fit_predict(data)

which eventually in the traceback points to the line:

scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_dispatcher.py

Line 278 in e5ce4bc

return ArgKmin64.compute(

18 hours ago all our tests passed in 40 minutes, then 3 hours ago it started failing 38% through the tests with a 70 minute timeout, and gets to the point only ~27 minutes into the build:

This suggests that the latest scientific-python-nightly-wheels upload of scikit-learn (and/or NumPy) 11 hours ago caused something in here to hang, so probably some recent PR to sklearn or NumPy is the culprit.

Not exactly a MWE -- I'm not on Windows at the moment but could switch at some point -- but maybe someone has an idea about why it's happening...?

The text was updated successfully, but these errors were encountered:

glemaitre · 2024-03-13T16:32:20Z

We can make a PR and trigger the nightly build to see if something changed.

larsoner · 2024-03-13T16:33:03Z

Actually from looking at our outputs the NumPy version that ends up being installed is NumPy 1.26.4 (OpenBLAS 0.3.23.dev with 2 threads) so it's maybe only a NumPy 2.0 compat issue in that you (presumably) compile your wheels against NumPy 2.0 dev or beta

glemaitre · 2024-03-13T16:35:23Z

On the nightly build, we compile against the 2.0 indeed and we have the normal CI running on 1.26. But this is an interesting problem because it means that compiling against 2.0 and having 1.26 could trigger a huge regression. I might try this configuration locally.

larsoner · 2024-03-13T16:37:53Z

... we also have scipy 1.13.0.dev0+1500.8c5c4fa on that run if it matters (see the print config step from any of our failing CI runs), but I assume sklearn has its own code for computing pairwise distances as opposed to using something like cdist / pdist (otherwise it could be SciPy's problem!).

glemaitre · 2024-03-13T16:39:49Z

Arff this is on Windows actually. I'll have to go through the CI then.

larsoner · 2024-03-13T18:19:54Z

Yes it's a pain :) Or you could use https://developer.microsoft.com/en-us/windows/downloads/virtual-machines/ if you want (it works for about a month)

lesteve · 2024-03-18T12:28:36Z

@larsoner is this still happening in the MNE CI and do you have a stand-alone snippet reproducing the issue?

I have a Windows VM so I could try to reproduce but I would welcome any help in order to put me on the right track.

larsoner · 2024-03-18T14:39:23Z

Not minimal by any means but I just used something like this to try to reproduce the same issue on a Windows 10 VM with some stable numpy and scipy releases:

git clone git@github.com:/larsoner/mne-python.git
cd mne-python
git checkout -b check origin/check  # branch with the problematic test reenabled
pytest mne\preprocessing\tests\test_lof.py  # works
pip install --index-url "https://pypi.anaconda.org/scientific-python-nightly-wheels/simple" "scikit-learn>=1.5.dev0"
pytest mne\preprocessing\tests\test_lof.py  # still works!
pip install --index-url "https://pypi.anaconda.org/scientific-python-nightly-wheels/simple" "scipy>=1.14.0.dev0"

This last one hits a SciPy import error (seen here in CIs just now) due to scipy/scipy#20268 about the C header size change. But going back to a SciPy that works:

pip install "scipy==1.12.0"
pytest mne\preprocessing\tests\test_lof.py

things are still okay. I then told the CIs in MNE-Python to use scipy==1.12.0 and things are okay there, too. So I'm guessing it's probably no longer an issue, and thus closing -- but I'll reopen if it pops back up once the NumPy<->SciPy compat issue is sorted out and I re-enable SciPy dev!

lesteve · 2024-03-18T14:52:21Z

OK thanks for the details, let me know if the issue comes back!

larsoner · 2024-03-19T15:11:48Z

😭 it's back -- from here the failing config is:

├☑ numpy                1.26.4 (OpenBLAS 0.3.23.dev with 2 threads)
├☑ scipy                1.14.0.dev0+577.d891e40
├☑ sklearn              1.5.dev0

Locally in my VM installing these and doing:

$ pip install --extra-index-url "https://pypi.anaconda.org/scientific-python-nightly-wheels/simple" "scikit-learn>=1.5.dev0" "scipy>=1.14.0.dev0" "numpy==1.26.4"
$ pytest mne/preprocessing/tests/test_lof.py

I can reproduce the hang!
Doing pip install scipy==1.12.0 fixes it
It doesn't matter which numpy I have installed, same hang with both 1.26.4 and 2.1.0.dev0

So I'm guessing it must have something to do with sklearn's internal use of some SciPy functions. Perhaps it's related to BLAS/LAPACK -- I know some infrastructure there has changed lately for SciPy...

I also boiled it down to a one-liner for you @lesteve , hopefully it reproduces for you like it does for me:

python -c "import numpy as np; from sklearn.neighbors import LocalOutlierFactor; LocalOutlierFactor(metric='euclidean').fit_predict(np.zeros((20, 14000)))"

larsoner · 2024-03-19T15:14:06Z

... and oddly enough if I reduce the dimensionality to something like (20, 1400) it does not hang. So there's something about the size being big enough like (20, 14000) that triggers the issue (maybe you have "big" and "small" code paths?).

lesteve · 2024-03-19T15:57:53Z

I can reproduce the hang, the fact that it still happens with scipy-dev and scikit-learn 1.4.1.post1 would seem to point towards a scipy dev change, but I will try to put together a snippet only using scipy to be 100% sure.

lesteve · 2024-03-19T16:14:15Z

Setting OPENBLAS_NUM_THREADS=1 or OMP_NUM_THREADS=1 seems to get rid of the issue, maybe a reasonable work-around in the short term (using threadpoolctl in a single test is an alternative if setting an environment variable is not convenient).

I am wondering if this is not due to nested parallelism OpenBLAS withing OpenMP in our neighbors code, not sure how this would be related to scipy-dev though.

larsoner · 2024-03-19T16:16:05Z

There was a bump to OpenBLAS 0.326 in scipy/scipy#20215 maybe that's it?

lesteve · 2024-03-19T16:38:09Z

There was a bump to OpenBLAS 0.326 in scipy/scipy#20215 maybe that's it?

Hmmm maybe hard to tell ... I am putting below what I learned so far, to be continued.

Here is a simpler scikit-learn snippet reproducing the hang, this seems to be related to ArgKmin in pairwise distances reductions. cc @jeremiedbb and @jjerphan in case they have some insights into this.

from sklearn.metrics._pairwise_distances_reduction import ArgKmin
import numpy as np
import threadpoolctl

# Uncommenting the next line fixes it, a similar line with OpenMP fixes it as well I think
# threadpoolctl.threadpool_limits(limits=1, user_api='blas')
X = np.zeros((20, 14000))
ArgKmin.compute(X=X, Y=X, k=10, metric='euclidean')

The sklearn show_versions info.

System:
    python: 3.12.2 | packaged by conda-forge | (main, Feb 16 2024, 20:42:31) [MSC v.1937 64 bit (AMD64)]
executable: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.4.1.post1
          pip: 24.0
   setuptools: 69.2.0
        numpy: 1.26.4
        scipy: 1.14.0.dev0
       Cython: None
       pandas: None
   matplotlib: None
       joblib: 1.3.2
threadpoolctl: 3.3.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
    num_threads: 4
         prefix: vcomp
       filepath: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 4
         prefix: libopenblas
       filepath: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\Lib\site-packages\numpy.libs\libopenblas64__v
0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: Haswell

       user_api: blas
   internal_api: openblas
    num_threads: 4
         prefix: libopenblas
       filepath: C:\Users\rjxQE\AppData\Local\miniforge3\envs\lof-issue\Lib\site-packages\scipy.libs\libopenblas_v0.3
.26-gcc_10_3_0-75ebbb8345f75277878db24d649d8b7e.dll
        version: 0.3.26
threading_layer: pthreads
   architecture: Haswell

jjerphan · 2024-03-19T16:54:50Z

Based on #28625 (comment), libopenblas seems to be loaded twice. If this is the case, this leads to undefined behavior.

I am afraid that there is not much we can do apart from recommending using conda packages instead of wheels. #23574 is a similar issue in this regards whose discussions provide some background on this situation.

@larsoner: could you report the runtime of the stack which are loaded using threadpoolctl for this case?

jeremiedbb · 2024-03-19T18:32:47Z

maybe related to scipy/scipy#20271

larsoner · 2024-03-19T18:39:31Z

In my VM:

$ python -c "import numpy; import scipy.linalg; import sklearn.neighbors; from threadpoolctl import threadpool_info; from pprint import pprint; pprint(threadpool_info())"
[{'architecture': 'Haswell',
  'filepath': 'C:\\Users\\tester\\mne-python\\1.6.1_0\\Lib\\site-packages\\numpy.libs\\libopenblas64__v0.3.23-293-gc2f4bdbb-gcc_10_3_0-2bde3a66a51006b2b53eb373ff767a3f.dll',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.23.dev'},
 {'architecture': 'Haswell',
  'filepath': 'C:\\Users\\tester\\mne-python\\1.6.1_0\\Lib\\site-packages\\scipy.libs\\libopenblas_v0.3.26-gcc_10_3_0-75ebbb8345f75277878db24d649d8b7e.dll',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.26'},
 {'filepath': 'C:\\Users\\tester\\mne-python\\1.6.1_0\\Lib\\site-packages\\sklearn\\.libs\\vcomp140.dll',
  'internal_api': 'openmp',
  'num_threads': 4,
  'prefix': 'vcomp',
  'user_api': 'openmp',
  'version': None}]

Azure:

[{'architecture': 'SkylakeX',
  'filepath': 'C:\\hostedtoolcache\\windows\\Python\\3.11.8\\x64\\Lib\\site-packages\\scipy.libs\\libopenblas_v0.3.26-gcc_10_3_0-75ebbb8345f75277878db24d649d8b7e.dll',
  'internal_api': 'openblas',
  'num_threads': 2,
  'prefix': 'libopenblas',
  'threading_layer': 'pthreads',
  'user_api': 'blas',
  'version': '0.3.26'},
 {'filepath': 'C:\\hostedtoolcache\\windows\\Python\\3.11.8\\x64\\Lib\\site-packages\\sklearn\\.libs\\vcomp140.dll',
  'internal_api': 'openmp',
  'num_threads': 2,
  'prefix': 'vcomp',
  'user_api': 'openmp',
  'version': None}]

And I can confirm OPENBLAS_NUM_THREADS=1 fixes it locally at least.

lesteve · 2024-03-20T09:39:37Z

I opened scipy/scipy#20294 to get insights from Scipy developers. I also realised the issue is in the scipy 1.13 release candidate.

Based on #28625 (comment), libopenblas seems to be loaded twice. If this is the case, this leads to undefined behavior.

I am afraid that there is not much we can do apart from recommending using conda packages instead of wheels.

This will always be the case with wheels, right? The OpenBLAS shipped with the numpy wheel and the OpenBLAS shipped with the scipy wheel will be different. This is a super common use case, we need to make sure it works ...

lesteve · 2024-03-20T10:41:11Z

Actually debugging a bit further it seems like this is due to OpenBLAS 0.3.26, I can reproduce the hang with conda-forge packages, see scipy/scipy#20294 (comment)

I guess this will need to be reported to OpenBLAS, although putting together some kind of reproducer will be a bit of work.

mattip · 2024-03-20T10:44:15Z

Hi. NumPy developer and OpenBLAS packager here. There should be no conflict using those versions of NumPy and SciPy since NumPy is building with the 64bit ILP interfaces. Using OPENBLAS_NUM_THREADS=1 points to some problem with threading. Do you know how many concurrent tasks are created when calling the reproducer?

python -c "import numpy as np; from sklearn.neighbors import LocalOutlierFactor; \
     LocalOutlierFactor(metric='euclidean').fit_predict(np.zeros((20, 14000)))"

jjerphan · 2024-03-20T10:59:47Z

Thank you for your help, @mattip.

There are generally as many threads as physical cores. Here are some pointers:

scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_base.pyx.tp

Lines 200 to 204 in 5673880

    
           # Not using less, not using more. 
        
           self.chunks_n_threads = min( 
        
               self.Y_n_chunks if self.execute_in_parallel_on_Y else self.X_n_chunks, 
        
               self.effective_n_threads, 
        
           )

scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_base.pyx.tp

Line 146 in 5673880

self.effective_n_threads = _openmp_effective_n_threads()

scikit-learn/sklearn/utils/_openmp_helpers.pyx

Lines 21 to 77 in 5673880

    
           cpdef _openmp_effective_n_threads(n_threads=None, only_physical_cores=True): 
        
               """Determine the effective number of threads to be used for OpenMP calls 
        
               - For ``n_threads = None``, 
        
                 - if the ``OMP_NUM_THREADS`` environment variable is set, return 
        
                   ``openmp.omp_get_max_threads()`` 
        
                 - otherwise, return the minimum between ``openmp.omp_get_max_threads()`` 
        
                   and the number of cpus, taking cgroups quotas into account. Cgroups 
        
                   quotas can typically be set by tools such as Docker. 
        
                 The result of ``omp_get_max_threads`` can be influenced by environment 
        
                 variable ``OMP_NUM_THREADS`` or at runtime by ``omp_set_num_threads``. 
        
               - For ``n_threads > 0``, return this as the maximal number of threads for 
        
                 parallel OpenMP calls. 
        
               - For ``n_threads < 0``, return the maximal number of threads minus 
        
                 ``|n_threads + 1|``. In particular ``n_threads = -1`` will use as many 
        
                 threads as there are available cores on the machine. 
        
               - Raise a ValueError for ``n_threads = 0``. 
        
               Passing the `only_physical_cores=False` flag makes it possible to use extra 
        
               threads for SMT/HyperThreading logical cores. It has been empirically 
        
               observed that using as many threads as available SMT cores can slightly 
        
               improve the performance in some cases, but can severely degrade 
        
               performance other times. Therefore it is recommended to use 
        
               `only_physical_cores=True` unless an empirical study has been conducted to 
        
               assess the impact of SMT on a case-by-case basis (using various input data 
        
               shapes, in particular small data shapes). 
        
               If scikit-learn is built without OpenMP support, always return 1. 
        
               """ 
        
               if n_threads == 0: 
        
                   raise ValueError("n_threads = 0 is invalid") 
        
               if not SKLEARN_OPENMP_PARALLELISM_ENABLED: 
        
                   # OpenMP disabled at build-time => sequential mode 
        
                   return 1 
        
               if os.getenv("OMP_NUM_THREADS"): 
        
                   # Fall back to user provided number of threads making it possible 
        
                   # to exceed the number of cpus. 
        
                   max_n_threads = omp_get_max_threads() 
        
               else: 
        
                   try: 
        
                       n_cpus = _CPU_COUNTS[only_physical_cores] 
        
                   except KeyError: 
        
                       n_cpus = cpu_count(only_physical_cores=only_physical_cores) 
        
                       _CPU_COUNTS[only_physical_cores] = n_cpus 
        
                   max_n_threads = min(omp_get_max_threads(), n_cpus) 
        
               if n_threads is None: 
        
                   return max_n_threads 
        
               elif n_threads < 0: 
        
                   return max(1, max_n_threads + n_threads + 1) 
        
               return n_threads

larsoner · 2024-03-20T12:01:21Z

On my failing CI I had explicitly set OPENBLAS_NUM_THREADS to 2 if that matters. I know it does for OpenBLAS but not sure if sklearn uses it in ArgKmin. Happy to test whatever might help in my VM. Maybe setting OMP_NUM_THREADS=2, since it looks like that would be used by both OpenBLAS and sklearn?

larsoner · 2024-03-20T13:35:14Z

Okay ran some tests using OPENMP_NUM_THREADS in range(1, 6) and OMP_NUM_THREADS in range(1, 6) to test messing with OpenBLAS and sklearn parallelization, respectively (I think!), simultaneously with:

OPENBLAS_NUM_THREADS=1 OMP_NUM_THREADS=1 python -c "import numpy as np; from sklearn.neighbors import LocalOutlierFactor; LocalOutlierFactor(metric='euclidean').fit_predict(np.zeros((20, 14000)))"

This table lists only the combinations out of the 25 tested that hung:

OPENBLAS_NUM_THREADS (OpenBLAS)	OMP_NUM_THREADS (sklearn)
2	2
--	--
3	3
--	--
4	2
4	3
4	4
4	5
--	--
5	2
5	3
5	4
5	5

I guess the TL;DR is that there are problems when OPENBLAS_NUM_THREADS == OMP_NUM_THREADS != 1 or when OPENBLAS_NUM_THREADS>=4 and OMP_NUM_THREADS>=2 (though I didn't test over 5, spot checking 7, 8 also hung).

github-actions bot added the Needs Triage Issue requires triage label Mar 13, 2024

This was referenced Mar 13, 2024

[pre-commit.ci] pre-commit autoupdate mne-tools/mne-python#12492

Merged

MAINT: Work around sklearn bug mne-tools/mne-python#12496

Merged

glemaitre removed the Needs Triage Issue requires triage label Mar 13, 2024

ogrisel added the cython label Mar 14, 2024

larsoner closed this as completed Mar 18, 2024

larsoner reopened this Mar 19, 2024

larsoner mentioned this issue Mar 19, 2024

MAINT: Reenable pre scipy and numpy mne-tools/mne-python#12506

Merged

lesteve mentioned this issue Mar 20, 2024

BUG: Hang on Windows in scikit-learn with 1.13rc1 and 1.14.dev (maybe due to OpenBLAS 0.3.26 update?) scipy/scipy#20294

Closed

lesteve changed the title ~~BUG: ArgKmin64 on Windows with NumPy 2.0 times out~~ BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out Mar 20, 2024

lesteve mentioned this issue Mar 25, 2024

FIX Avoid dead-lock with OpenBLAS 0.3.26 on Windows inside pairwise distances calculation #28692

Merged

jeremiedbb closed this as completed in #28692 Mar 26, 2024

lesteve mentioned this issue Mar 26, 2024

Dead-lock on 0.3.26 on Windows in OpenBLAS multi-threading nested within OpenMP multi-threading OpenMathLib/OpenBLAS#4582

Closed

larsoner mentioned this issue Aug 2, 2024

dipy==1.9.0 is causing our macOS GitHub Actions runners to stall on unrelated torch tests dipy/dipy#3125

Open

6 tasks

lesteve mentioned this issue Jan 4, 2025

scikit-learn 1.6 changed behavior of growing trees #30554

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out #28625

BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out #28625

larsoner commented Mar 13, 2024

glemaitre commented Mar 13, 2024

larsoner commented Mar 13, 2024

glemaitre commented Mar 13, 2024

larsoner commented Mar 13, 2024

glemaitre commented Mar 13, 2024

larsoner commented Mar 13, 2024

lesteve commented Mar 18, 2024

larsoner commented Mar 18, 2024

lesteve commented Mar 18, 2024

larsoner commented Mar 19, 2024 •

edited

Loading

larsoner commented Mar 19, 2024

lesteve commented Mar 19, 2024

lesteve commented Mar 19, 2024 •

edited

Loading

larsoner commented Mar 19, 2024

lesteve commented Mar 19, 2024

jjerphan commented Mar 19, 2024 •

edited

Loading

jeremiedbb commented Mar 19, 2024

larsoner commented Mar 19, 2024

lesteve commented Mar 20, 2024

lesteve commented Mar 20, 2024

mattip commented Mar 20, 2024

jjerphan commented Mar 20, 2024 •

edited

Loading

larsoner commented Mar 20, 2024

larsoner commented Mar 20, 2024 •

edited

Loading

BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out #28625

BUG: ArgKmin64 on Windows with scipy 1.13rc1 or 1.14.dev times out #28625

Comments

larsoner commented Mar 13, 2024

glemaitre commented Mar 13, 2024

larsoner commented Mar 13, 2024

glemaitre commented Mar 13, 2024

larsoner commented Mar 13, 2024

glemaitre commented Mar 13, 2024

larsoner commented Mar 13, 2024

lesteve commented Mar 18, 2024

larsoner commented Mar 18, 2024

lesteve commented Mar 18, 2024

larsoner commented Mar 19, 2024 • edited Loading

larsoner commented Mar 19, 2024

lesteve commented Mar 19, 2024

lesteve commented Mar 19, 2024 • edited Loading

larsoner commented Mar 19, 2024

lesteve commented Mar 19, 2024

jjerphan commented Mar 19, 2024 • edited Loading

jeremiedbb commented Mar 19, 2024

larsoner commented Mar 19, 2024

lesteve commented Mar 20, 2024

lesteve commented Mar 20, 2024

mattip commented Mar 20, 2024

jjerphan commented Mar 20, 2024 • edited Loading

larsoner commented Mar 20, 2024

larsoner commented Mar 20, 2024 • edited Loading

larsoner commented Mar 19, 2024 •

edited

Loading

lesteve commented Mar 19, 2024 •

edited

Loading

jjerphan commented Mar 19, 2024 •

edited

Loading

jjerphan commented Mar 20, 2024 •

edited

Loading

larsoner commented Mar 20, 2024 •

edited

Loading