Skip to content

⚠️ CI failed on Ubuntu_Atlas.ubuntu_atlas ⚠️ #27126

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
scikit-learn-bot opened this issue Aug 22, 2023 · 13 comments · Fixed by #27281
Closed

⚠️ CI failed on Ubuntu_Atlas.ubuntu_atlas ⚠️ #27126

scikit-learn-bot opened this issue Aug 22, 2023 · 13 comments · Fixed by #27281

Comments

@scikit-learn-bot
Copy link
Contributor

scikit-learn-bot commented Aug 22, 2023

CI is still failing on Ubuntu_Atlas.ubuntu_atlas (Sep 28, 2023)

  • test_pairwise_distances_argkmin[45-csr_matrix-float32-parallel_on_X-cityblock-0-500]
  • test_pairwise_distances_argkmin[45-csr_matrix-float32-parallel_on_Y-cityblock-0-500]
  • test_pairwise_distances_argkmin[45-csr_array-float32-parallel_on_X-cityblock-0-500]
  • test_pairwise_distances_argkmin[45-csr_array-float32-parallel_on_Y-cityblock-0-500]
@github-actions github-actions bot added the Needs Triage Issue requires triage label Aug 22, 2023
@betatim
Copy link
Member

betatim commented Aug 22, 2023

@jjerphan or @Micky774 do you have a moment to take a look at these failures? Seeing "pairwise distances" makes me think of you :D

@jjerphan
Copy link
Member

I do not have time now unfortunately.

@Micky774
Copy link
Contributor

I'll take a look at this soon!

@jjerphan
Copy link
Member

I think that's a flaky test for this configuration which fails depending on random seeds for this translation (the 1000000.0 value here).

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2023

#27122 seems like a duplicate, with a different seed in a different env.

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2023

I confirm seed 49 and 52 reproduce the problem on my machine:

SKLEARN_TESTS_GLOBAL_RANDOM_SEED="all" pytest -k "braycurtis and test_pairwise_distances_argkmin" -v sklearn/metrics/tests/test_pairwise_distances_reduction.py
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[49-float32-parallel_on_Y-braycurtis-0-50] - AssertionError: Neighbors indices for query 75 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[49-float32-parallel_on_X-braycurtis-0-50] - AssertionError: Neighbors indices for query 75 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[52-float32-parallel_on_X-braycurtis-1000000.0-500] - AssertionError: Neighbors indices for query 61 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[52-float32-parallel_on_Y-braycurtis-1000000.0-500] - AssertionError: Neighbors indices for query 61 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04

All other seed/shape combinations pass. Since my env is very different from the CI, I suspect that this test is too seed sensitive. Maybe there are tied values for those combinations of metric, seed and shape?

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2023

There is also a problem with the cityblock metric: #27080:

    test_pairwise_distances_argkmin[45-float32-parallel_on_X-cityblock-0-500]
    test_pairwise_distances_argkmin[45-float32-parallel_on_Y-cityblock-0-500]

I am re-running this test locally with for all seeds and all metrics.

EDIT: here is the full list of failures on my local machine:

FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[52-float32-parallel_on_Y-braycurtis-1000000.0-500] - AssertionError: Neighbors indices for query 61 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[52-float32-parallel_on_X-braycurtis-1000000.0-500] - AssertionError: Neighbors indices for query 61 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[45-float32-parallel_on_X-cityblock-0-500] - AssertionError: Neighbors indices for query 61 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[45-float32-parallel_on_Y-cityblock-0-500] - AssertionError: Neighbors indices for query 61 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[49-float32-parallel_on_X-braycurtis-0-50] - AssertionError: Neighbors indices for query 75 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04
FAILED sklearn/metrics/tests/test_pairwise_distances_reduction.py::test_pairwise_distances_argkmin[49-float32-parallel_on_Y-braycurtis-0-50] - AssertionError: Neighbors indices for query 75 are not matching when rounding distances at 3 significant digits derived from rtol=1.0e-04

Since each failure happens both for parallel_on_X/Y those are only 3 metric/shape/seed combos that fail. They were all discovered by linked issues on the CI.

@ogrisel
Copy link
Member

ogrisel commented Aug 22, 2023

Maybe we could improve the assertion error message to give more information about:

  • the index of the query vector (already present) but also its components (or at least the first 5 or so)
  • the distance computed by our Cython code for the first non-matching neighbor vector and the components of that vector,
  • the distance computed by the reference code for the first non-matching neighbor vector and the components of that vector.

@scikit-learn-bot
Copy link
Contributor Author

scikit-learn-bot commented Aug 23, 2023

CI is no longer failing! ✅

Successful run on Sep 27, 2023

@jjerphan
Copy link
Member

jjerphan commented Sep 3, 2023

I can't reproduce the failure locally on my machine. 🤔

Which implementations of OpenBLAS and OpenMP are you using? Here are mine:

python -m threadpoolctl -i scipy -i numpy -i sklearn
[
  {
    "user_api": "openmp",
    "internal_api": "openmp",
    "num_threads": 8,
    "prefix": "libgomp",
    "filepath": "/home/jjerphan/.local/share/mambaforge/envs/sk/lib/libgomp.so.1.0.0",
    "version": null
  },
  {
    "user_api": "blas",
    "internal_api": "openblas",
    "num_threads": 8,
    "prefix": "libopenblas",
    "filepath": "/home/jjerphan/.local/share/mambaforge/envs/sk/lib/libopenblasp-r0.3.23.so",
    "version": "0.3.23",
    "threading_layer": "pthreads",
    "architecture": "Haswell"
  }
]

@jjerphan
Copy link
Member

jjerphan commented Sep 3, 2023

#27281 has been opened to improve the error message.

@jeremiedbb
Copy link
Member

+1 for improving the error message, but we also need to do something about the failures. Should we disable the global_random_seed parametrization ?

@jjerphan
Copy link
Member

jjerphan commented Sep 5, 2023

On the short term, I would mark those configuration as xfail.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants