-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Default to Accelerate on macOS and add wheels for macOS >=14 #24905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Addresses the arm64 part of numpygh-24905.
Addresses the arm64 part of numpygh-24905.
Addresses the arm64 part of numpygh-24905.
Addresses the arm64 part of numpygh-24905.
Addresses the arm64 part of numpygh-24905.
Addresses the arm64 part of numpygh-24905.
The most important part of this, for arm64, was done in gh-25255. In case GitHub Actions releases macOS 14 images in time, we can also get x86-64 wheels in for 2.0. But if not, then not. |
Hi @rgommers, is the smaller wheel massively important given the scales we're talking about here? Asking as OpenBLAS builds a lot of different CPU variants for dynamic dispatch, and I could look into shrinking that based on the expected users of NumPy or overlapping cores? (https://github.com/OpenMathLib/OpenBLAS/blob/67779177b94f03493a44888f6a682065d5f59618/Makefile.system#L676-L700) |
I'd say yes, that matters a lot, for multiple reasons:
The performance gain of Accelerate is still more important I'd say than the size though.
The OpenBLAS we vendor in wheels is built in https://github.com/MacPython/openblas-libs. I'm not sure how much we optimized the settings or if there's much to gain. Maybe @mattip knows? |
We set a baseline CPU targets in the build script and then OpenBLAS fills in any optional dynamic dispatch variants above those minima. I guess we could try increasing that minimum, although we do occasionally get complaints that even |
Unless I'm missing something, you will not need DYNAMIC_ARCH for Mac/aarch64 as there's only the one target for M1 and newer (moot if you're switching to Accelerate for its proprietary AMX support). For the others architectures you can mix your own DYNAMIC_LIST subset of cpus you deem relevant |
Good news finally on the GitHub Actions front:
|
Okay, I think I've got this working. There was a fairly large hiccup: the macOS 14 image is only supported for arm64. So when trying the
because the wheel has to target macOS 14.0 and hence cannot be tested on the platform on which it's built. I considered setting Here is a first successful run with a wheel that can be downloaded, if anyone wants to try on an Intel Mac: https://github.com/rgommers/numpy/actions/runs/8163109512 |
To be honest I hadn't even thought about making
Looking at your commits I don't think it's necessary to have a My plans for mac wheel building were to move the arm wheel build from cirrus to GHA, but I was thinking of doing that after numpy2.0 was released. Do we want to go ahead with that transition now? The transition could include the kinds of changes you linked to. |
Because it was the only option; I wanted testing of Accelerate and there was no macos-14 image. It's also fine for regular CI. The problem with wheels is a flaw in
Agreed. Those
We may as well do that I think, if CI is green. It's a lot easier to retrigger nightly builds in GHA, or debug a single job on your own fork.
That is the key question, yes. I had always planned on doing this - although it would have been more clear cut if GHA had macos-14 images for x86-64. Macbooks can last a long time, and wheels that are >2x smaller and up to 8x faster on some key linear algebra functions like In case it becomes a burden in the future, these wheels are still safe to drop in a later release. I think it'll be fine though. I expect the real burden for macOS x86-64 to show up once GHA drops native hardware support completely. According to the roadmap in actions/runner-images#9255 we're good for 2024, but will only have |
wait, you're seeing 8x speedup for linalg.svd using Accelerate on aging Intel Macs too, where the M-series AMX tile plays no role at all and I am used to seeing speed penalties for OpenBLAS within about ten percent of MKL ? |
Ehhh no, apologies - that is my tired brain not working at all. The binary size reduction is of interest for x86-64; the performance isn't. We do not have new benchmark results that say otherwise at least. The x86-64 implementations also got upgraded in Accelerate, so they are available with NEWLAPACK symbols and should be more robust - but no indication that they are much faster. It's also not easy to find older benchmarks (things like https://blog.kiyohiroyabuuchi.com/en/c-cpp-en/comparison-between-mac-accelerate-framework-and-intel-mkl/ hint at "roughly on par with MKL and OpenBLAS"). The binary size gain is actually larger than 2x (e.g., from this CI run):
So it's about 3x for x86-64, and 2.6x for arm64. I no longer have an Intel Mac, so no way to easily check performance. |
Thanks for the clarification :) |
With gh-25945 this is now complete - let's see how it goes🤞🏼. Thanks all! |
As a clarification for future readers: the order of auto-detection now is:
Wheels are still all built with OpenBLAS, except for macOS >=14.0 (Accelerate) and 32-bit Windows (no BLAS). |
The benefits of using the updated Accelerate library are (1) performance (xref gh-24053 for benchmarks) and (2) smaller wheels because we're no longer vendoring OpenBLAS.
The first part of this is done: after gh-24893 we are defaulting to the new Accelerate on macOS >=13.3 when building from source.
The second part is adding a new set of wheel build that use Accelerate rather than OpenBLAS on macOS >=14. Note that it can only be from macOS 14 rather than 13.3, because the
packaging
library (and hencepip
) do not take macOS minor version into account, so using13_3
in wheel names will lead topip
installing those also on 13.0-13.2 and those will then crash with missing symbols at runtime.We need a macOS 14 image to build wheels (ideally, we could rename wheels manually in a pinch). The status of those CI images is:
macos-sonoma-xcode
(see here for all images). So we can do this now forarm64
wheelsx86_64
we have to wait. I'd expect support to materialize well before the 2.0 release date.The text was updated successfully, but these errors were encountered: