Skip to content

BUG: remove NPY_ALIGNMENT_REQUIRED #29094

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 8, 2025
Merged

BUG: remove NPY_ALIGNMENT_REQUIRED #29094

merged 2 commits into from
Jun 8, 2025

Conversation

SoapGentoo
Copy link
Contributor

  • This machinery requires strict-aliasing UB and isn't needed anymore with any GCC from the last 15 years.

Fixes: #28991

@SoapGentoo SoapGentoo force-pushed the fix-28991 branch 2 times, most recently from c222128 to 6a1b276 Compare May 30, 2025 18:22
@seberg
Copy link
Member

seberg commented May 30, 2025

Hmmm, I had expected this to be used mostly in the casts, but it's the one place where it was always just turned off...

Somehow, for all of these, I am wondering if the copy is OK (or the removal of the isAligned is a good idea)? The compiler optimizes the memcpy sufficiently (i.e. it doesn't do it all), but doesn't auto-vectorize the alternative loops right?

I suppose this is good and we should likely just do it, but glancing at the changes, I do wonder if they are actually performance relevant...

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label May 30, 2025
@charris
Copy link
Member

charris commented May 30, 2025

I wonder if numpy/_core/include/numpy/npy_cpu.h is considered to be in the public ABI.

@seberg
Copy link
Member

seberg commented May 30, 2025

It probably is somehwat public, but maybe that just means we may want to add a release note if we go this route? If it bites us, it probably also bites anyone who uses it...

EDIT: ah, wait... that means we should define it to an error though!

@SoapGentoo
Copy link
Contributor Author

@seberg you mean define it to an error if someone defines NPY_ALIGNMENT_REQUIRED?

@charris
Copy link
Member

charris commented May 30, 2025

define it to an error though!

That's hard to do in C. There is -Wundef that will raise a warning for use of undefined macros in gcc and clang, but -Wall doesn't enable it. Or we could just set it to 1, but that wouldn't trigger code cleanup.

@charris
Copy link
Member

charris commented May 30, 2025

I think we can regard it as effectively private. Scipy, for instance, doesn't use it.

@seberg
Copy link
Member

seberg commented May 31, 2025

Yeah, for a moment I thought you could define it to a #error, but you can only do that by putting some invalid syntax, which isn't worth it probably...

@SoapGentoo
Copy link
Contributor Author

so just remove it entirely?

@ngoldbaum
Copy link
Member

so just remove it entirely?

Sounds like that's the best thing to do - can you make that change? Hopefully on the second run the docs won't fail due to issues with debian package servers...

@charris
Copy link
Member

charris commented Jun 3, 2025

Hmm, this doesn't look right to me. The compile problem was fixed by setting NPY_ALIGNMENT_REQUIRED 1, so I would think the proper thing to do is look at all the uses, and see what would happen if that were the case. In particular, !NPY_ALIGNMENT_REQUIRED is false, so can effectively be replaced by 0.

@charris
Copy link
Member

charris commented Jun 3, 2025

Note that CI isn't testing with gcc 15, so we need to test elsewhere.

@charris
Copy link
Member

charris commented Jun 3, 2025

If we do leave NPY_ALIGNMENT_REQUIRED 1 in there, which is fine by me, it should also have a comment of explanation.

@SoapGentoo
Copy link
Contributor Author

@ngoldbaum done

@@ -259,11 +251,12 @@ npy_memchr(char * haystack, char needle,
}
else {
/* usually find elements to skip path */
if (!NPY_ALIGNMENT_REQUIRED && needle == 0 && stride == 1) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

!NPY_ALIGNMENT_REQUIRED is false, so this entire bit can be removed.

@@ -1602,7 +1602,6 @@ pack_inner(const char *inptr,
npy_intp vn_out = n_out - (remain ? 1 : 0);
const int vstep = npyv_nlanes_u64;
const int vstepx4 = vstep * 4;
const int isAligned = npy_is_aligned(outptr, sizeof(npy_uint64));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you removing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler warns on -Wunused-but-set-variable in Clang

@ngoldbaum
Copy link
Member

So maybe I'm doing something wrong, but I can't seem to figure out how to trigger the issue this PR is addressing.

I'm using the gcc-15.1 bookwork docker image and building NumPy from source using Python 3.13.3 from uv.

I'm building NumPy with CFLAGS='-march=native' spin build

When I do spin test numpy/_core/test/test_half.py - the tests in that file all pass.

Am I doing something wrong to trigger the issue? I'd like to verify that this PR is indeed fixing the crash.

@thesamesam
Copy link
Contributor

To check the obvious, what is -march=native resolving to on your system? I've only seen it on znver4 but x86-64-v4 may be enough. When I played on godbolt, it definitely needed AVX512 to vectorise it.

Try adding -O3 as well.

@ngoldbaum
Copy link
Member

Ah it's because my CPU isn't new enough - this skylake CPU doesn't have avx512.

@thesamesam
Copy link
Contributor

You might be able to force it with -fno-vect-cost-model but IIRC I tried and it wasn't good enough on znver2. However, you could try UBSAN and see if you get an alignment complaint before this PR (remember to comment out the suppression) and if it goes away after.

@charris
Copy link
Member

charris commented Jun 4, 2025

The logic can be twisty, there should be another review.

@charris
Copy link
Member

charris commented Jun 4, 2025

Note that many arrays are aligned and allow a fast path. Whether or not to rely on compilers to deal with that is a different question. Compilers also need to know the array type, which they cannot determine from the raw pointers.

@SoapGentoo
Copy link
Contributor Author

part of the problem is, even if the data is known to be aligned npy_uint64* ptr = (npy_uint64*)some_other_ptr runs afoul of strict-aliasing rules. The memcpy() calls act as a "strict-aliasing firewall", which is why you see it here. On x86, memcpy loads with known sizes get optimized out by GCC and Clang.

@charris
Copy link
Member

charris commented Jun 4, 2025

Maybe add a comment explaining the memcpy use?

@ngoldbaum
Copy link
Member

npy_is_aligned is used in a few other places:

± rg npy_is_aligned
numpy/_core/src/umath/fast_loop_macros.h
322:     (npy_is_aligned(args[0], esize) && npy_is_aligned(args[1], esize)) && \
338:     npy_is_aligned(args[1], (esize)) && \
339:     npy_is_aligned(args[0], (esize)))
343:     npy_is_aligned(args[2], (esize)) && npy_is_aligned(args[1], (esize)) && \
344:     npy_is_aligned(args[0], (esize)) && \
352:     npy_is_aligned(args[2], (esize)) && npy_is_aligned(args[1], (esize)) && \
359:     npy_is_aligned(args[2], (esize)) && npy_is_aligned(args[0], (esize)) && \

numpy/_core/src/multiarray/compiled_base.c
1605:        const int isAligned = npy_is_aligned(outptr, sizeof(npy_uint64));

numpy/_core/src/multiarray/item_selection.c
2528:        if (!NPY_ALIGNMENT_REQUIRED || npy_is_aligned(data, sizeof(npy_uint64))) {

numpy/_core/src/multiarray/ctors.c
389:        if (npy_is_aligned((void*)((npy_intp)p | stride), sizeof(npy_uint32))) {
402:        if (npy_is_aligned((void*)((npy_intp)p | stride), sizeof(npy_uint64))) {
415:        if (npy_is_aligned((void*)((npy_intp)p | stride), sizeof(npy_uint16))) {

numpy/_core/src/multiarray/array_assign_scalar.c
55:              npy_is_aligned(src_data, npy_uint_alignment(src_dtype->elsize) &&
56:              npy_is_aligned(src_data, src_dtype->alignment));
145:              npy_is_aligned(src_data, npy_uint_alignment(src_dtype->elsize) &&
146:              npy_is_aligned(src_data, src_dtype->alignment));
259:            !(npy_is_aligned(src_data, npy_uint_alignment(src_dtype->elsize)) &&
260:              npy_is_aligned(src_data, src_dtype->alignment))) &&

numpy/_core/src/multiarray/common.h
182:npy_is_aligned(const void * p, const npy_uintp alignment)

numpy/_core/src/multiarray/einsum_sumprod.c.src
27:    #define EINSUM_IS_ALIGNED(x) npy_is_aligned(x, NPY_SIMD_WIDTH)

numpy/_core/src/multiarray/arraytypes.c.src
382:        assert(npy_is_aligned(ov, NPY_ALIGNOF(@type@)));
3055:            if (swap || !npy_is_aligned(nip1, new->alignment)) {
3068:            if (swap || !npy_is_aligned(nip2, new->alignment)) {

numpy/_core/src/multiarray/lowlevel_strided_loops.c.src
137:    assert(N == 0 || npy_is_aligned(dst, NPY_ALIGNOF_UINT(@type@)));
138:    assert(N == 0 || npy_is_aligned(src, NPY_ALIGNOF_UINT(@type@)));
223:    assert(N == 0 || npy_is_aligned(dst, NPY_ALIGNOF_UINT(@type@)));
224:    assert(N == 0 || npy_is_aligned(src, NPY_ALIGNOF_UINT(@type@)));
901:    assert(N == 0 || npy_is_aligned(src, NPY_ALIGNOF(_TYPE1)));
902:    assert(N == 0 || npy_is_aligned(dst, NPY_ALIGNOF(_TYPE2)));
1563:            assert(npy_is_aligned(ind_ptr, NPY_ALIGNOF_UINT(npy_intp)));
1577:            assert(npy_is_aligned(result_ptr, NPY_ALIGNOF_UINT(@copytype@)));
1578:            assert(npy_is_aligned(self_ptr, NPY_ALIGNOF_UINT(@copytype@)));
1592:            assert(npy_is_aligned(result_ptr, NPY_ALIGNOF_UINT(@copytype@)));
1593:            assert(npy_is_aligned(self_ptr, NPY_ALIGNOF_UINT(@copytype@)));
1719:                            assert(npy_is_aligned(outer_ptrs[i],
1740:                        assert(npy_is_aligned(outer_ptrs[i],
1742:                        assert(npy_is_aligned(self_ptr,
1756:                        assert(npy_is_aligned(outer_ptrs[i],
1758:                        assert(npy_is_aligned(self_ptr,

numpy/_core/src/common/array_assign.c
124:        return npy_is_aligned((void *)align_check, alignment);

Sometimes it's for asserts but other times it's in load-bearing spots in core logic. Do we need to audit those too?

@charris
Copy link
Member

charris commented Jun 4, 2025

Hmm, I note the following in meson.build

# Add default compile flags for any compiler that supports them.
# Note that MSVC does not support strict aliasing at all, and neither do the
# Intel compilers on Windows, so the `-fno` flavor of the flag should be fine.
add_project_arguments(
  cc.get_supported_arguments( '-fno-strict-aliasing'), language : 'c'
)

@thesamesam
Copy link
Contributor

thesamesam commented Jun 4, 2025

See #25004. The hope is to get rid of it. The thing is, patterns which violate strict aliasing which are "okay" with -fno-strict-aliasing will still run afoul of alignment rules.

@charris
Copy link
Member

charris commented Jun 4, 2025

@thesamesam We discussed this today and decided we would be more comfortable if this PR confined itself to fixing the original problem. We would rather leave the aliasing fixes for another PR.

@charris charris removed the 09 - Backport-Candidate PRs tagged should be backported label Jun 5, 2025
@ngoldbaum
Copy link
Member

ngoldbaum commented Jun 5, 2025

@h-vetinari (who wanted me to emphasize that he's not a subject matter expert) shared this article with me, that seems relevant: https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

We should probably open a followup issue to look closer at all the code in NumPy that checks alignment before casting instead of memcpy to do type punning.

@thesamesam
Copy link
Contributor

thesamesam commented Jun 5, 2025

I think it's a shame that another PR got merged without any authorship/credit, or waiting for OP to respond - especially given that there didn't seem to be a hurry either or much attention on it until I triaged the bug (which is fine, but it means there wasn't a need to rush...). Anyway, yes, another issue for that sounds like a good idea.

@ngoldbaum
Copy link
Member

I'm not totally sure why @charris did that, but I think it's because he's trying to get a 2.3.0 release shipped ASAP.

@h-vetinari
Copy link
Contributor

@h-vetinari (who wanted me to emphasize that he's not a subject matter expert) shared this article with me, that seems relevant: https://gist.github.com/shafik/848ae25ee209f698763cffee272a58f8

Thanks for the ping and caveat @ngoldbaum. :)

In contrast, Shafik Yaghmour (the author of that article) is a subject matter expert, as one of the core clang developers and serial hole-poker in C/C++ (look up his language quizzes).

Overall I agree (e.g. with what @thesamesam said in the issue) that this PR is going in the right direction. The casts that reinterpret bits of memory as another type are simply accidents waiting to happen (e.g. as these types may end up having different alignment). Where such type punning is unavoidable, it should be done using memcpy.

@charris
Copy link
Member

charris commented Jun 6, 2025

@thesamesam Note that this PR is still active if you want to finish it up. I made the fix in the maintenance branch for the release, but left this so you will get credit if/when it is merged.

@SoapGentoo
Copy link
Contributor Author

what do you want me to do? I thought it's clear by now that we want to get rid of the type-punning through pointers?

@charris
Copy link
Member

charris commented Jun 8, 2025

what do you want me to do?

Deal with NPY_ALIGNMENT_REQUIRED in this PR, i.e., the simplest and most limited fix. Then open another PR to start on the npy_is_aligned fixups if you wish. There is more than one use of that, so it is a minor project. That way we can review the two modifications separately, deal with the immediate problem, and then work on removal of no-strict-aliasing flag.

@SoapGentoo
Copy link
Contributor Author

@charris updated

@charris
Copy link
Member

charris commented Jun 8, 2025

Looks good. Needs a release note, feel free to copy the contents of the one in #29134 without attribution.

SoapGentoo and others added 2 commits June 8, 2025 19:24
* This machinery requires strict-aliasing UB and isn't needed
  anymore with any GCC from the last 15 years. This might also
  fix numpy#25004.

Fixes: numpy#28991
@SoapGentoo
Copy link
Contributor Author

@charris added

@charris
Copy link
Member

charris commented Jun 8, 2025

29134.compatibility.rst -> 29094.compatibility.rst

The number is the PR number, it is used by towncrier to generate the release note with a link.

@SoapGentoo
Copy link
Contributor Author

done

@charris charris merged commit 93340d3 into numpy:main Jun 8, 2025
73 of 74 checks passed
@charris
Copy link
Member

charris commented Jun 8, 2025

Thanks @SoapGentoo . The test failure is unrelated.

@SoapGentoo SoapGentoo deleted the fix-28991 branch June 8, 2025 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

GCC 15.1.1 compiles NumPy, but the tests segfault.
6 participants