Skip to content

BLD Add missing OpenMP dependencies in relevant meson.build #29694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 29, 2024

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Aug 20, 2024

Fix #29665

This may be a good enough reason for a 1.5.2?

I checked that the OpenMP dependencies were added for the relevant meson.build with this grep pattern, let me know if you have a better idea:

❯ git grep -Pl 'cython.*parallel' -- **/*.pyx*
sklearn/_loss/_loss.pyx.tp
sklearn/cluster/_k_means_common.pyx
sklearn/cluster/_k_means_elkan.pyx
sklearn/cluster/_k_means_lloyd.pyx
sklearn/cluster/_k_means_minibatch.pyx
sklearn/ensemble/_hist_gradient_boosting/_binning.pyx
sklearn/ensemble/_hist_gradient_boosting/_gradient_boosting.pyx
sklearn/ensemble/_hist_gradient_boosting/_predictor.pyx
sklearn/ensemble/_hist_gradient_boosting/histogram.pyx
sklearn/ensemble/_hist_gradient_boosting/splitting.pyx
sklearn/manifold/_barnes_hut_tsne.pyx
sklearn/metrics/_pairwise_distances_reduction/_argkmin.pyx.tp
sklearn/metrics/_pairwise_distances_reduction/_argkmin_classmode.pyx.tp
sklearn/metrics/_pairwise_distances_reduction/_base.pyx.tp
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx.tp
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors_classmode.pyx.tp
sklearn/metrics/_pairwise_fast.pyx
sklearn/utils/arrayfuncs.pyx

I also checked whether omp was present in the ldd output of 1.4.1 .so files (1.4.1 wheels have been built with setuptools) and compare with the 1.5.1 one (1.5.1 wheels have been built with meson). This does match the missing openmp_dep.

Copy link

github-actions bot commented Aug 20, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: a9d28be. Link to the linter CI: here

@adrinjalali
Copy link
Member

Could we not add openmp as a default dependency to everything?

@lesteve
Copy link
Member Author

lesteve commented Aug 20, 2024

Could we not add openmp as a default dependency to everything?

Right now, if we want to go down that route, I don't know any other way than adding the openmp_dep to all the extension module one by one ... Feel free to browse the Meson documentation because I may have missed something, for example starting in https://mesonbuild.com/Dependencies.html.

Full disclosure: I find this route not not very natural but I guess it has the advantage (as with our previous setuptools setup) of "I don't want to have to think about this".

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may be a good enough reason for a 1.5.2?

I agree this should be in 1.5.2. Can we add a changelog entry for 1.5.2?

@thomasjpfan thomasjpfan added this to the 1.5.2 milestone Aug 20, 2024
@thomasjpfan thomasjpfan added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label Aug 20, 2024
Copy link
Member

@adam2392 adam2392 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2c: I do like the explicit verbose dependencies of meson. IIUC, it is a "feature".

@adrinjalali
Copy link
Member

From what I understand, adding -fopenmp to default_options on the project level's meson.build would add it to all compiler commands. Wouldn't that be more what we want? I'm uncomfortable with the idea of forgetting it in yet another folder where we might try to use openmp in the future.

@lesteve
Copy link
Member Author

lesteve commented Aug 22, 2024

I added a changelog. Improvements more than welcome!

From what I understand, adding -fopenmp to default_options on the project level's meson.build would add it to all compiler commands. Wouldn't that be more what we want?

Except that this is not always -fopenmp, for example IIRC this is /openmp on Windows, and even more complicated on OSX with system clang and OpenMP from brew. In my basic understanding of Meson, dependencies tackle this kind of compiler-dependent flags. I haven't found something that indicates that you can add a dependency globally.

In an ideal world i.e. not necessarily this PR, we would have a build error (or maybe an error at import-time if easier) if somehow OpenMP is enabled in reference module (probably https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/_openmp_helpers.pxd) and not the current one. Is it possible? maybe? Am I likely to work on it any time soon? Probably not 😅.

I'm uncomfortable with the idea of forgetting it in yet another folder where we might try to use openmp in the future.

Yeah I can understand this. IMO adding a new Cython file does not happen that often and if it does, people will likely copy from somewhere else a meson.build snippet. There is indeed a small chance that they copy a snippet without OpenMP dependency and that their code rely on OpenMP, but I would hope that in this case some local testing would indicate that OpenMP is not enabled e.g. htop or time output or changing OMP_NUM_THREADS not having any effect.

In other words this happened for the setuptools -> meson transition (sorry about this), but I would hope that it is a lot less likely to happen now.

@adrinjalali
Copy link
Member

Then could we add it to every single meson.build file? It won't hurt, since it just adds options to the compiler.

@lesteve
Copy link
Member Author

lesteve commented Aug 23, 2024

I find it would make the meson.build more confusing. There are currently 13 .pyx that uses OpenMP and 114 .pyx in total.

I have been quite confused a few times in the past with setuptools when seeing that OpenMP flags were used for all the extensions and also when looking at the ldd output of particular .so files where omp always appears and thought surely this particular module does not use OpenMP (full disclosure this does not happen for releases somehow, I am guessing there is some kind of optimisation that removes a .so if symbols are not used?).

I do understand the failure case looks bad (OpenMP silently unused whereas everything looks completely fine) but as I tried to argue I do think it is quite unlikely that it will happen again. Of course, more than happy to hear other opinions on this!

I spent some time finding a way to have an import-time error if OpenMP support is enabled in a reference module (`sklearn.utils._openmp_helpers) and not the current one (or the other way around but less likely).

Each .pyx that needs it would need to add something like this:

from ..utils._openmp_helpers cimport check_openmp_support
check_openmp_support()

This can be forgotten but if deemed necessary, we could add some kind of check in build_tools/linting.sh to make sure that if a pyx uses OpenMP (with a reasonable git grep pattern) it also calls check_openmp_support().

The error would happen at import-time and look like this:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/lesteve/dev/scikit-learn/sklearn/manifold/__init__.py", line 10, in <module>
    from ._t_sne import TSNE, trustworthiness
  File "/home/lesteve/dev/scikit-learn/sklearn/manifold/_t_sne.py", line 34, in <module>
    from . import _barnes_hut_tsne, _utils  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "_barnes_hut_tsne.pyx", line 24, in init sklearn.manifold._barnes_hut_tsne
  File "_openmp_helpers.pxd", line 42, in sklearn.utils._openmp_helpers.check_openmp_support
RuntimeError: OpenMP support discrepancy, you probably need to add openmp_dep to the extension module dependencies in the relevant meson.build file
OpenMP enabled for sklearn.utils._openmp_helpers (reference): True
OpenMP enabled for sklearn.manifold._barnes_hut_tsne: False

The diff looks like this:

diff --git a/sklearn/manifold/_barnes_hut_tsne.pyx b/sklearn/manifold/_barnes_hut_tsne.pyx
index f0906fbf2b..9bf8ee354f 100644
--- a/sklearn/manifold/_barnes_hut_tsne.pyx
+++ b/sklearn/manifold/_barnes_hut_tsne.pyx
@@ -20,6 +20,9 @@ cnp.import_array()
 
 cdef char* EMPTY_STRING = ""
 
+from ..utils._openmp_helpers cimport check_openmp_support
+check_openmp_support()
+
 # Smallest strictly positive value that can be represented by floating
 # point numbers for different precision levels. This is useful to avoid
 # taking the log of zero when computing the KL divergence.
diff --git a/sklearn/utils/_openmp_helpers.pxd b/sklearn/utils/_openmp_helpers.pxd
index a7694d0be2..c2adc957db 100644
--- a/sklearn/utils/_openmp_helpers.pxd
+++ b/sklearn/utils/_openmp_helpers.pxd
@@ -31,3 +31,18 @@ cdef extern from *:
     void omp_unset_lock(omp_lock_t*) noexcept nogil
     int omp_get_thread_num() noexcept nogil
     int omp_get_max_threads() noexcept nogil
+
+cdef inline check_openmp_support():
+    # Need absolute rather than relative import here. Relative import would be
+    # relative to the module where check_openmp_support is cimported
+    from sklearn.utils._openmp_helpers import _openmp_parallelism_enabled
+    reference_openmp_parallelism_enabled =_openmp_parallelism_enabled()
+    print(f'{SKLEARN_OPENMP_PARALLELISM_ENABLED=}')
+    print(f'{reference_openmp_parallelism_enabled=}')
+    if SKLEARN_OPENMP_PARALLELISM_ENABLED != reference_openmp_parallelism_enabled:
+        raise ValueError(
+            f"OpenMP support discrepancy, you probably need to add openmp_dep to the extension module dependencies "
+            "in the relevant meson.build file\n"
+            f"OpenMP enabled for sklearn.utils._openmp_helpers (reference): {reference_openmp_parallelism_enabled}\n"
+            f"OpenMP enabled for {__name__}: {SKLEARN_OPENMP_PARALLELISM_ENABLED}"
+        )

Edit: another alternative is meson introspect build/cp312 --targets (build dir needs to be adapted it's not always build/cp312) which dumps a json and probably has a way to figure out which modules are enabled for OpenMP and do some sanity check compared to what we expect based on a git grep regexp.

@adrinjalali
Copy link
Member

This looks reasonable to me.

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merging as is, since there is no real reason to block this. I'm very unhappy that a module could start using openmp and we silently would just not use that in compile time, and that I can't really check whether now we actually have added openmp everywhere that we should.

So this is an improvement, but I can't double check whether we still have a regression compared to pre-meson or not.

@adrinjalali adrinjalali merged commit 602aaaa into scikit-learn:main Aug 29, 2024
30 checks passed
@lesteve lesteve deleted the meson-openmp branch August 30, 2024 13:49
MarcBresson pushed a commit to MarcBresson/scikit-learn that referenced this pull request Sep 2, 2024
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:manifold module:metrics module:utils To backport PR merged in master that need a backport to a release branch defined based on the milestone.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TSNE performance regression in 1.5
4 participants