Skip to content

ENH: add new wheel builds using Accelerate on macOS >=14 #25255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 21, 2023

Conversation

andyfaff
Copy link
Member

@andyfaff andyfaff commented Nov 26, 2023

@rgommers alternative to #25012.

EDIT: the new Accelerate-based wheels weigh in at about 5.0 MB, which is impressively small compared to the OpenBLAS arm64 wheels at ~14 MB and the OpenBLAS x86-64 wheels at ~21 MB.

@github-actions github-actions bot added the 36 - Build Build related PR label Nov 26, 2023
@andyfaff
Copy link
Member Author

This should work now. There are two cpN wheels produced for macos_arm64. Those suitable for macOS < 14 and those for macOS >= 14.0.
The linux_aarch64 is currently disabled, and should be renabled before merge.

@rgommers
Copy link
Member

rgommers commented Dec 6, 2023

Thanks @andyfaff! This looks pretty good to me, however it's not a complete replacement for gh-25012 and lost a few things from that PR. How about I resubmit the commits that were good to go from that PR separately, we merge those, and then we rebase this PR on top? That will then be a slightly smaller diff here with only the essential changes for Accelerate wheels.

@rgommers rgommers changed the title BLD: modify macos wheel build to use accelerate [wheel build][skip ac… BLD: modify macOS wheel build to use Accelerate on macOS >=14 Dec 21, 2023
@rgommers rgommers changed the title BLD: modify macOS wheel build to use Accelerate on macOS >=14 ENH: add new wheel builds using Accelerate on macOS >=14 Dec 21, 2023
@rgommers rgommers added this to the 2.0.0 release milestone Dec 21, 2023
@rgommers
Copy link
Member

This still looks good. I went through the Python 3.12 log in detail just now, and everything looks as expected. Capturing some details for future reference:

cibuildwheel version 2.16.2

Project version: 2.0.0.dev0+git20231126.c6bcaaa
C compiler for the host machine: arm64-apple-darwin20.0.0-clang (clang 16.0.6 "clang version 16.0.6")
C linker for the host machine: arm64-apple-darwin20.0.0-clang ld64 609
C++ compiler for the host machine: arm64-apple-darwin20.0.0-clang++ (clang 16.0.6 "clang version 16.0.6")
C++ linker for the host machine: arm64-apple-darwin20.0.0-clang++ ld64 609
Cython compiler for the host machine: cython (cython 3.0.7)
Host machine cpu family: aarch64
Host machine cpu: aarch64
Program python found: YES (/private/var/folders/c2/b2kmn__12wq57m82ltz0hl1w0000gn/T/build-env-sqihr22c/bin/python)
Found pkg-config: /opt/homebrew/bin/pkg-config (0.29.2)
Run-time dependency python found: YES 3.12
Has header "Python.h" with dependency python-3.12: YES 
Compiler for C supports arguments -fno-strict-aliasing: YES 
Compiler for C supports arguments -ftrapping-math: YES 
Compiler for C supports link arguments -Wl,-ld_classic: NO 
Message: During parsing cpu-dispatch: The following CPU features were ignored due to platform incompatibility or lack of support:
"XOP FMA4"
Test features "NEON NEON_FP16 NEON_VFPV4 ASIMD" : Supported 
Test features "ASIMDHP" : Supported 
Test features "ASIMDFHM" : Supported 
Test features "SVE" : Unsupported due to Compiler fails against the test code of "SVE"

Run-time dependency scipy-openblas found: NO (tried pkgconfig)
Run-time dependency mkl found: NO (tried pkgconfig and system)
Run-time dependency accelerate found: YES
../numpy/meson.build:111: WARNING: Project targets '>=1.2.99' but uses feature introduced in '1.3.0': dep 'accelerate' custom lookup.
Message: BLAS symbol suffix: $NEWLAPACK$ILP64

1 wheels produced in 4 minutes:
  numpy-2.0.0.dev0-cp312-cp312-macosx_14_0_arm64.whl   4,725 kB

@rgommers
Copy link
Member

Compared to the 11_0 wheel with OpenBLAS:

1 wheels produced in 4 minutes:
  numpy-2.0.0.dev0-cp312-cp312-macosx_11_0_arm64.whl   13,398 kB

That's an almost 3x size reduction.

@rgommers
Copy link
Member

Ah, there's one macOS job (with OpenBLAS / 11.0) that fails with a single test crash:

Fatal Python error: Segmentation fault
...
________________________ _core/tests/test_multiarray.py ________________________
[gw2] darwin -- Python 3.10.11 /private/var/folders/c2/b2kmn__12wq57m82ltz0hl1w0000gn/T/cibw-run-vpgc5boa/cp310-macosx_arm64/venv-test/bin/python
worker 'gw2' crashed while running '_core/tests/test_multiarray.py::test_partition_int[uint16-N128]'

That is a test using np.sort, so probably also related to the new Highway code & SVE. There are build warnings about an uninitialized variable which may or may not be related:

[58/310] Compiling C object numpy/_core/libargfunc.dispatch.h_baseline.a.p/meson-generated_argfunc.dispatch.c.o
In file included from ../numpy/_core/src/multiarray/argfunc.dispatch.c.src:12:
In file included from ../numpy/_core/src/common/simd/simd.h:85:
In file included from ../numpy/_core/src/common/simd/neon/neon.h:76:
../numpy/_core/src/common/simd/neon/memory.h:56:56: warning: variable 'a' is uninitialized when used here [-Wuninitialized]
    a = vld1q_lane_s32((const int32_t*)ptr,            a, 0);
                                                       ^
/Users/admin/micromamba/envs/numpydev/lib/clang/16/include/arm_neon.h:9481:20: note: expanded from macro 'vld1q_lane_s32'
  int32x4_t __s1 = __p1; \
                   ^~~~
../numpy/_core/src/common/simd/neon/memory.h:55:5: note: variable 'a' is declared here
    int32x4_t a;
    ^

For this PR we're good I think.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, and everything seems happy. The rerun of the last wheel build didn't crash, so almost certainly related to sort and gh-25445. I'll aim to merge this once the aarch64 wheel builds are fixed (there won't be uploads from the new jobs before then).

@rgommers
Copy link
Member

Actually, the memory: 4G fix in here is needed for the aarch64 builds, so let me merge it now. Thanks again @andyfaff!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants