Skip to content

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #23786

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scikit-learn-bot opened this issue Jun 29, 2022 · 6 comments
Closed

⚠️ CI failed on Linux_nogil.pylatest_pip_nogil ⚠️ #23786

scikit-learn-bot opened this issue Jun 29, 2022 · 6 comments

Comments

@scikit-learn-bot
Copy link
Contributor

scikit-learn-bot commented Jun 29, 2022

CI is still failing on Linux_nogil.pylatest_pip_nogil (Jul 25, 2022)

  • Test Collection Failure
@github-actions github-actions bot added the Needs Triage Issue requires triage label Jun 29, 2022
@lesteve
Copy link
Member

lesteve commented Jun 29, 2022

The nogil segfault was already noticed in #23762 (comment).

______________ metrics/tests/test_pairwise_distances_reduction.py ______________
[gw0] linux -- Python 3.9.10 /home/vsts/work/1/s/testvenv/bin/python
worker 'gw0' crashed while running 'metrics/tests/test_pairwise_distances_reduction.py::test_memmap_backed_data[euclidean-PairwiseDistancesArgKmin]'
______________ metrics/tests/test_pairwise_distances_reduction.py ______________
[gw2] linux -- Python 3.9.10 /home/vsts/work/1/s/testvenv/bin/python
worker 'gw2' crashed while running 'metrics/tests/test_pairwise_distances_reduction.py::test_memmap_backed_data[euclidean-PairwiseDistancesRadiusNeighborhood]'

This is not deterministic and we were not able to reproduce locally, not sure what to do about it ...

@lesteve lesteve added Build / CI and removed Needs Triage Issue requires triage labels Jun 29, 2022
@scikit-learn-bot
Copy link
Contributor Author

scikit-learn-bot commented Jul 1, 2022

CI is no longer failing! ✅

Successful run on Jul 26, 2022

@thomasjpfan
Copy link
Member

The last time this job failed was 7 days ago. Since then the test has been passing.

@lesteve
Copy link
Member

lesteve commented Jul 22, 2022

The failure still happens from time to time e.g. July 15 and July 22 (clicking on "edited" in the top post) ...

@lesteve
Copy link
Member

lesteve commented Jul 25, 2022

Looking a bit closer, it seems like all the failing builds have in common that the architecture is SkylakeX but that the scipy shipped openblas (quite old 3.3 for some reason) detect the architecture as Prescott. On the succesful builds, the detected architecture is Haswell. Maybe this is loosely related to #21361.

Failing build

(e.g. 1, 2, 3, 4)

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/vsts/work/1/s/testvenv/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-7cf8cb25.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 2

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/vsts/work/1/s/testvenv/lib/python3.9/site-packages/scipy.libs/libopenblas-r0-f650aae0.3.3.so
        version: None
threading_layer: disabled
   architecture: **Prescott**
    num_threads: 1

       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /usr/lib/x86_64-linux-gnu/libgomp.so.1.0.0
        version: None
    num_threads: 2

Succeeding build:

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/vsts/work/1/s/testvenv/lib/python3.9/site-packages/numpy.libs/libopenblas64_p-r0-7cf8cb25.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: SkylakeX
    num_threads: 2

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/vsts/work/1/s/testvenv/lib/python3.9/site-packages/scipy.libs/libopenblas-r0-f650aae0.3.3.so
        version: None
threading_layer: disabled
   architecture: **Haswell**
    num_threads: 1

@thomasjpfan
Copy link
Member

I'm closing this given #23994 is merged and colesbury/nogil-install#2 is fixed.

The CI will open another issue if this happens again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants