Consider unifying the two OpenBLAS libraries in NumPy and SciPy wheels Create OpenBLAS wheel

This is a follow up to gh-15050, which describes a performance issue on arm64 macOS due to having two OpenBLAS libraries present. In the `arm64` wheels for SciPy `1.7.3`, that was fixed by setting a single environment variable (`OPENBLAS_THREAD_TIMEOUT`) in `_distributor_init.py`.

That may not be the optimal long-term fix, and scikit-learn has seen a similar issue on Linux when two OpenMP runtimes are present. So this issue is for tracking another potential solution. Starting from https://github.com/scipy/scipy/issues/15050#issuecomment-975318631, comment by @ogrisel, copied here:

> _Opened MacPython/scipy-wheels#143 for the environment variable solution. I'd like to go with that for 1.7.3. To be discussed I'd say if we keep that or if we want to rebuild OpenBLAS for 1.8.0._

_Indeed for a minor release, a minimal fix only for the macos m1 problem is best._

_For the more medium term plan, maybe we could do this series of change more progressively. Here are some suggestions:_

- _Build OpenBLAS with `THREAD_TIMEOUT=1` for the wheels will probably help for the short to medium term on making it possible to workaround problems such as scikit-learn/scikit-learn#20642 when there are several threading runtimes linked to the same Python process._

_For the even longer term we could try plan changes to completely avoid redundancy of OpenBLAS and OpenMP wheels in the scipy stack. First for OpenBLAS itself:_

- _Expose the Cython BLAS API in `numpy.linalg` the same way [scipy.linalg does it](https://docs.scipy.org/doc/scipy/reference/linalg.cython_blas.html);_
- _Make scipy use the BLAS implementation of numpy and stop shipping it (while continuing exposing it via the Cython BLAS API replicated in scipy for backward compat reasons). This means that pypi will stop shipping two copies of OpenBLAS for the scipy stack which should help with maintenance in the long run (and reduce a bit the scipy wheel sizes which is always nice)._
- _This way numpy would be the official BLAS implementation provider for the full scipy stack._

_About the switch from native OpenBLAS threads to OpenMP, this would be nice for scikit-learn but [libgomp is still not fork safe](https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035). conda-forge typically replaces libgomp by llvm-openmp by default for this reason. Note that on linux, gcc is still the recommended compiler for conda-forge, which means that the library is built against libgomp but then the openmp runtime is replaced afterwards._

_Also note, for some reason conda-forge decided to use OpenBLAS native threads by default on Linux while it's also possible to make it use OpenMP. See the `2020-07-17` entry of the [conda-forge announcements](https://conda-forge.org/docs/user/announcements.html). Maybe @isuruf can explain why not use LLVM OpenMP (edit: added missing LLVM) by default on Linux?_

_More details on OpenMP in conda-forge in https://conda-forge.org/docs/maintainer/knowledge_base.html#openmp_

_Maybe the topic of using OpenMP in the scipy stack deserves a [Scientific Python SPEC](https://discuss.scientific-python.org/c/specs)._

My main reply: 

> _This way numpy would be the official BLAS implementation provider for the full scipy stack._

_This sounds less than ideal to me - we'd have to duplicate the Cython APIs, plus NumPy has far fewer BLAS/LAPACK needs than SciPy and is still on an older minimum required version than SciPy (that's why for example Accelerate is re-enabled in NumPy but cannot be in SciPy)._

_My first reaction is that shipping a standalone wheel would be preferable to this solution._

> _Maybe the topic of using OpenMP in the scipy stack deserves a Scientific Python SPEC._

_Yes, I think it does. And it's broader than just packaging, there's an important UX question there as well that deserves thinking about - are we continuing to default to single-threaded everywhere, except for BLAS/LAPACK calls?_

And @ogrisel's reply:

> _My first reaction is that shipping a standalone wheel would be preferable to this solution._

_Fine with me as well._

> _are we continuing to default to single-threaded everywhere, except for BLAS/LAPACK calls?_

_I cannot say for scipy but for scikit-learn we have already started to use OpenMP multithreaded Cython code by default for some components and are progressively generalizing this approach in the code base._


xref gh-10239 for the issues around OpenMP, and why we don't use it in SciPy.

For the main topic of this issue: shipping separate OpenBLAS and OpenMP wheels isn't great either, and not something I'd like us to be responsible for. But it's probably either that, or stay with the status quo - which perhaps isn't very healthy if scikit-learn continues to use more OpenMP in more of its API. 

The issue is specific to PyPI/wheels, because any other binary install mechanism tends to have a proper package manager which can guarantee only a single OpenBLAS/OpenMP library is used. Shipping native dependencies like these as separate wheels is in general an unhealthy idea for multiple reasons, but if it's just limited to OpenBLAS and OpenMP and we can scope it to just the NumPy/SciPy stack for those libraries, then it may be okay to do.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Consider unifying the two OpenBLAS libraries in NumPy and SciPy wheels Create OpenBLAS wheel #15129

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Consider unifying the two OpenBLAS libraries in NumPy and SciPy wheels Create OpenBLAS wheel #15129

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions