Skip to content

BLD: replace uses of openblas_support with openblas wheels [wheel build] #25505

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Feb 10, 2024

Conversation

mattip
Copy link
Member

@mattip mattip commented Dec 27, 2023

Use the openblas wheels instead of tools/openblas_support.py in

  • wheel building
  • nightly/stable openblas32 CI runs
  • sdist

This will fix #25503 (unpin cibuildwheel)

@github-actions github-actions bot added the 36 - Build Build related PR label Dec 27, 2023
@rgommers
Copy link
Member

Thanks Matti. This looks about right; the failures on 32-bit Windows, musllinux and macOS arm64 are all the same (per platform) - you may want to fix those on a single wheel build job on your own fork before running the whole battery of builds here.

@andyfaff
Copy link
Member

Checkout https://github.com/numpy/numpy/wiki/Debugging-CI-guidelines for some suggestions on how to debug cirrus locally (won't help with macos arm 64 if you don't have your own Mac)

@mattip
Copy link
Member Author

mattip commented Dec 31, 2023

Mess. The scipy-openblas wheels differ from the tarballs. The tarballs we currently use have, in usr/local/lib, symlinks like

libscipy_openblas64_.a -> libscipy_openblas64_p-r0.3.24.dev.a
libscipy_openblas64_.so -> libscipy_openblas64_p-r0.3.24.dev.so
libscipy_openblas64_.so.0 -> libscipy_openblas64_p-r0.3.24.dev.so
libscipy_openblas64_p-r0.3.24.dev.so

But the real file libscipy_openblas64_p-r0.3.24.dev.so has an SONAME of libscipy_openblas64_.so.0, so after linking _multiarray_umath.so with the shared object it expects to find libscipy_openblas64_.so.0 in the LD_LIBRARY_PATH (the names are changed but the idea is the same on macos). This makes dynamically linking the artifacts in the wheel useless.

I think we have to rework the scipy-openblas wheel build before we can use it here to

  • ship libscipy_openblas64_.so not libscipy_openblas64_p-r0.3.24.dev.so, and
  • change the SONAME to libscipy_openblas64_.so

@rgommers
Copy link
Member

rgommers commented Jan 1, 2024

That seems like the right change to make. There's no need for names like p-r0.3.24 at all, and symlinks in wheels are bad.

@mattip
Copy link
Member Author

mattip commented Feb 2, 2024

Updated to use the newer 0.3.26.0.2 wheels with a single shared object

@rgommers
Copy link
Member

rgommers commented Feb 2, 2024

Diff looks reasonable - just some wheel build jobs to make happy

@mattip
Copy link
Member Author

mattip commented Feb 4, 2024

@andyfaff do you know why the cirrus build sequence for wheels first builds NumPy, then calls cibuildwheel to build it again? It is failing since the cibw_before_build.sh does not overwrite $PWD/.openblas the second time it is called.

@mattip
Copy link
Member Author

mattip commented Feb 4, 2024

macos with python 3.11, 3.12 and pypy3.9 is failing but with python3.9 and 3.10 is passing. The problem seems to be that delocate is not seeing the DYLD_LIBRARY_PATH environment variable, so delocate does not find the scipy_openblas shared object. Did something change in the way delocate process the environment between the versions?

@andyfaff
Copy link
Member

andyfaff commented Feb 4, 2024

do you know why the cirrus build sequence for wheels first builds NumPy, then calls cibuildwheel to build it again?

It shouldn't do. Can you link to a build log that shows this. When I look at e.g. the cron job on cirrus I can't see the two builds.
I did notice something wrong with the cirrus macos_arm64 build though, it shouldn't be installing gfortran from cibw_before_build, it should be using the mamba environment installed compiler for gfortran.

A quick reminder, there is a matrix of images that the wheel building occurs for. Once for all the pythons on sonoma, and once on monterey. The wheels on Sonoma should be using Accelerate for BLAS, those on Monterey are supposed to still use OpenBLAS.
So that's a second issue, the Sonoma builds for the last cron job are finding and using scipy-openblas, when they should be using Accelerate. That needs to be fixed, the whole point of Accelerate is to reduce the size of wheels. It's important to fix this before the next release happens! (@charris, @rgommers). I don't actually know where scipy-openblas is being installed for that wheel build job.

It's not commented why delocate is pinned in the cibw_before_build script. I'll have a look into these issues in the next few days. It's going to be a slow ping time for me because I'm busy at work this week.

@mattip
Copy link
Member Author

mattip commented Feb 4, 2024

Can you link to a build log that shows this.

Here is a build log with 4 steps: clone, build, install_cibuildwheel, cibuildwheel.

the Sonoma builds for the last cron job are finding and using scipy-openblas

Those builds are probable not setting INSTALL_OPENBLAS=false, so it is being set by default here

if [ -z $INSTALL_OPENBLAS ]; then
export INSTALL_OPENBLAS=true
fi

which then triggers the use of OpenBLAS

@andyfaff
Copy link
Member

andyfaff commented Feb 4, 2024

clone, build, install_cibuildwheel, cibuildwheel

The naming of those build steps is a little cryptic. The first 'build' step is actually firing all the other steps off, not actually building numpy in it. It could be called something like 'initial setup' or something like that. I'm not sure why the cibw_before_build script is executed in that step, possibly cruft that can be cleaned up. Debugging all of this is probably better done locally, using the wiki instructions I wrote.

@rgommers
Copy link
Member

rgommers commented Feb 5, 2024

Those builds are probable not setting INSTALL_OPENBLAS=false, so it is being set by default here

The Accelerate jobs should not have OpenBLAS installed, and that was certainly working until recently. Maybe the requirements files reshuffle broke it?

@andyfaff
Copy link
Member

andyfaff commented Feb 5, 2024

Should be fixed by #25763

@mattip
Copy link
Member Author

mattip commented Feb 5, 2024

Debugging all of this is probably better done locally, using the wiki instructions I wrote.

Thanks I will try out these instructions without a VM on a mac machine. Or did you mean the containerized one?

@andyfaff
Copy link
Member

andyfaff commented Feb 5, 2024

If you have a macos m series I would try https://github.com/numpy/numpy/wiki/Debugging-CI-guidelines#local-build-of-macos-wheels-using-cibuildwheel-in-a-macos-tart-vm with the entries that youve seen fail.

@mattip mattip force-pushed the replace-openblas_support branch from d5c4981 to fcc7ac6 Compare February 5, 2024 22:07
@mattip
Copy link
Member Author

mattip commented Feb 5, 2024

Rebased on top of #25763 and added a fix for cibuildwheel not setting macos repair wheel environment variables as specified in the pyproject.toml


source $PROJECT_DIR/tools/wheels/gfortran_utils.sh
install_gfortran
pip install "delocate==0.10.4"
Copy link
Member Author

@mattip mattip Feb 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: this whole stanza has been removed: gfortran will not be installed on macos. The scipy-openblas wheels do not need it. This means the build is faster , but also that we cannot test 2py in these builds and I think the tests can use the gfortran from the conda install.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but also that we cannot test 2py in these builds.

That's fine; those are way too slow and need their own CI jobs instead. There's already an open issue for that somewhere.

@mattip
Copy link
Member Author

mattip commented Feb 8, 2024

I think this is ready for review. Some follow up, that I will do together with fixes from review:

  • delete the tools/openblas_support.py script
  • think about compacting the build matrix for cirrus arm64 builds. Even the double cp310 cp311 build is finishing in ~7 minutes.

@mattip
Copy link
Member Author

mattip commented Feb 8, 2024

think about compacting the build matrix for cirrus arm64 builds. Even the double cp310 cp311 build is finishing in ~7 minutes.

That seems very fast, it is an order of magnitude faster than the cirrus aarch64 builds and twice as fast as the linux x86_64 cp11 build. Maybe something is wrong?

@andyfaff
Copy link
Member

andyfaff commented Feb 8, 2024

By default each cirrus Mac runner has several cores, so compilation and testing is fast. That seems reasonable timing. If I do a clean build of numpy (with Accelerate) the build step takes 17s on an M1 mac, the testing takes about 2.5 mins.

@mattip
Copy link
Member Author

mattip commented Feb 8, 2024

One wheel build timed out in testing. Everything else passed. I don't think the failure is connected to this PR, but maybe?

@mattip
Copy link
Member Author

mattip commented Feb 8, 2024

Rerunning the timed-out muslinux wheel build, it now passed. CI is green.

@andyfaff
Copy link
Member

andyfaff commented Feb 9, 2024

A question before I review. The TLDR of the PR seems to be: install scipy_openblasX, then create a PKG_CONFIG_PATH with a scipy-openblas.pc file that is sourced from the scipy-openblasX wheel.

It's probably been discussed before, but it's not clear to me why the build (i.e. meson) can't be scripted in such a way that if it knows it's going to be using the scipy-openblasX wheel to source openblas, then why can't it query the wheel to figure out where the pc file is, and where the shared library is? i.e. why does this PR have to jump through hoops to move the pc file into a PKG_CONFIG_PATH?

e.g. on a mac the spin cmd for build appears to work as follows (won't have all the details correct here):

  1. If there are no blas specified find and use Accelerate
  2. If you use the -use-scipy-openblas flag then use openblas from the wheel
  3. use pkg-config to figure out the relevant BLAS, possibly having to use flag settings to change which BLAS is desired, and PKG_CONFIG_PATH to specify where the pc file is.

My point is that in step 2 the user doesn't have to fiddle about with setting PKG_CONFIG_PATH, or creating files, everything is taken care of behind the scenes. Why can't this happen in this PR, why the manual intervention by setting paths beforehand?

- name: Build a wheel via an sdist
env:
PKG_CONFIG_PATH: ${{ github.workspace }}/.openblas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is set twice, probably because environment variables aren't persistent between steps? Would it be better to use GITHUB_ENV?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like you set here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

@@ -72,26 +72,32 @@ jobs:
python-version: '3.11'

- name: Install dependencies
env:
# Need to do this in every step since setup-python overrides PKG_CONFIG_PATH
PKG_CONFIG_PATH: ${{ github.workspace }}/.openblas
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Write to GITHUB_ENV to prevent having to set it repeatedly?

PKG_CONFIG_PATH=$PROJECT_DIR/.openblas
rm -rf $PKG_CONFIG_PATH
mkdir -p $PKG_CONFIG_PATH
python -m pip install scipy-openblas64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for maintenance branches how do we choose what version of scipy-openblas gets installed. e.g. for a certain maintenance branch we may want to stick with a certain (older) version, and bump when we do a major release. At the moment we just get the latest version.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should use the ci_requirements.txt file.

elif [[ $RUNNER_OS == "Windows" ]]; then
# Install Openblas from scipy-openblas64
if [[ "$INSTALL_OPENBLAS" = "true" ]] ; then
echo PKG_CONFIG_PATH $PKG_CONFIG_PATH
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we have to install scipy-openblas here - understood. However, why is the rest needed? Can't the meson build process do the querying of scipy-openblas to figure out all the necessary info?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meson could run python and try to import scipy_openblas32, then try to import scipy_openblas64. If either of those succeed, it could use get_include() and get_libary() to add the proper flags to the build. @rgommers thoughts?

@mattip
Copy link
Member Author

mattip commented Feb 9, 2024

Why can't this happen in this PR, why the manual intervention

The cibw_before_build.sh script does exactly what spin --with-open-blas does: create a pc file. Using spin during the build would not be a 1:1 replacement. In wheel building, the wheel packaging step (delocate/auditwheel/delvewheel) needs to know where to find the shared objects, which must be done outside the controlled spin environment. I guess this too could be moved inside spin with a dedicated spin repair-wheel command to abstract out the differences between the three tools and to add the proper path to the shared objects.

Is that what you were thinking?

@mattip
Copy link
Member Author

mattip commented Feb 9, 2024

the spin cmd for build appears to work as follows ...

Just to be more accurate: that is not the spin logic, it is logic inside numpy/meson.build, which has branching logic to handle the various BLAS options. It could call incantations of run_command(py, ...) to replace parts of the cibw_before_build.sh script and the spin --with-scipy-openblas logic with calls to scipy_openbas32 or scipy_openblas64. I am not sure where that "magic" should happen.

Outside meson.build: meson tries very hard to not incorporate too much logic and magic inside it, and depends on the user to properly set up pkg-config or cmake files for dependency detection.

Inside meson.build: there is some merit to not repeating the pkg-config file creation twice, and to eliminate the need to externally set a PKGC_CONFIG_PATH environment variable.

@andyfaff
Copy link
Member

andyfaff commented Feb 9, 2024

OK, so you're saying that the main purpose is to let delocate/auditwheel/delvewheel know where the libraries are? I don't know why they would need to be told that. For example, if I use otool on a recently built scipy dylib:

(dev3) 192-168-1-107:linalg andrew$ otool -L cython_lapack.cpython-310-darwin.so 
cython_lapack.cpython-310-darwin.so:
	/opt/arm64-builds/lib/libopenblas.0.dylib (compatibility version 0.0.0, current version 0.0.0)

It tells me where the openblas library is! They should just go and grab it.

Perhaps I'm being too nitpicky. What we have so far is obviously working, so perhaps we should merge instead of making things too magical.

@mattip
Copy link
Member Author

mattip commented Feb 9, 2024

I don't know why they would need to be told that.

The tarball used by openblas_support is built with absolute paths burned into the shared objects (RPATH and friends) on linux and macos. Using relative paths instead like the scipy-openblas self-contained wheel build has implications. You can see the commit history where I tried many iterations of things until I got to this point where everything JustWorks and CI is green.

Edit: with that, I am prepared to rethink the work here and discuss alternatives.

Edit2: I think you are looking at the results of a local build of scipy. When installing scipy into a virtual environment from a wheel it looks different:

% otool -L /tmp/venv3python312/lib/python3.12/site-packages/scipy//linalg/cython_lapack.cpython-312-darwin.so
/tmp/venv3python312/lib/python3.12/site-packages/scipy//linalg/cython_lapack.cpython-312-darwin.so:
	@loader_path/../.dylibs/libopenblas.0.dylib (compatibility version 0.0.0, current version 0.0.0)
	/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.100.1)

@andyfaff
Copy link
Member

andyfaff commented Feb 9, 2024

Ah ok. Well then since it's green we may as well merge, modulo the things that I suggested that you think should be added.

@mattip
Copy link
Member Author

mattip commented Feb 9, 2024

I pushed a fix commit:

  • remove the tools/openblas_support.py script
  • reviewed all the places PKG_CONFIG_PATH is set and try to compact it to once per CI file
  • used spin config-openblas where possible
  • add a comment about /project as a path
  • compact the cirrus wheel build jobs from 3 to 2. I think it could even be 1 job and still not exceed an hour.

@mattip
Copy link
Member Author

mattip commented Feb 9, 2024

Using spin config-openblas cannot be done in a wheel-based build, since it prepares the build for the case of using the scipy-openblas wheel at runtime, and in the wheel-build case we want to build a stand-alone wheel that packages the shared objects into the wheel rather than get them from the scipy-openblas wheel.

@mattip
Copy link
Member Author

mattip commented Feb 9, 2024

Ci is green after the changes.

@andyfaff
Copy link
Member

Looks good to me. A good block of work, thanks @mattip

@andyfaff andyfaff merged commit 673d604 into numpy:main Feb 10, 2024
@rgommers rgommers added this to the 2.0.0 release milestone Feb 10, 2024
before-build = "bash {project}/tools/wheels/cibw_before_build.sh {project}"
# The build will use openblas64 everywhere, except on arm64 macOS >=14.0 (uses Accelerate)
config-settings = "setup-args=-Duse-ilp64=true setup-args=-Dallow-noblas=false build-dir=build"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattip why did you remove use-ilp64 here? It looks like it switched all the wheels to LP64. Log extracts:

# manylinux x86-64:
  Run-time dependency scipy-openblas found: YES 0.3.26
  Message: BLAS symbol suffix: 
#  Accelerate:
Run-time dependency accelerate found: YES
Message: BLAS symbol suffix: $NEWLAPACK

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the macros to the pkg-config file here so it is not needed for the scipy-openblas build. I guess it is important to leave it in for diagnostics and for other blas builds though, right?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes indeed

@rgommers
Copy link
Member

No more tarballs or openblas_support script, nice! Thanks for getting this in. Looks good overall, modulo my one comment.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants