Skip to content

BUG: SIMD Neon undefined behavior - pointer overflow #29549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

m-clare
Copy link
Contributor

@m-clare m-clare commented Aug 12, 2025

NPY_FINLINE npyv_u32 npyv_loadn_u32(const npy_uint32 *ptr, npy_intp stride)
{
    assert(llabs(stride) <= NPY_SIMD_MAXLOAD_STRIDE32);
    const __m256i steps = _mm256_setr_epi32(0, 1, 2, 3, 4, 5, 6, 7);
    const __m256i idx = _mm256_mullo_epi32(_mm256_set1_epi32((int)stride), steps);
    return _mm256_i32gather_epi32((const int*)ptr, idx, 4);
}

While looking at other architecture options for the same function npyv_loadn_u32, I noticed that sse, lsx, vec do not have a bounds check on the stride, but were not flagged by UBSAN (they all use array indexing instead of pointer arithmetic). For example for sse/memory.h L79:

NPY_FINLINE npyv_s32 npyv_loadn_s32(const npy_int32 *ptr, npy_intp stride)
{
    __m128i a = _mm_cvtsi32_si128(*ptr);
#ifdef NPY_HAVE_SSE41
    a = _mm_insert_epi32(a, ptr[stride],   1);
    a = _mm_insert_epi32(a, ptr[stride*2], 2);
    a = _mm_insert_epi32(a, ptr[stride*3], 3);
#else
    __m128i a1 = _mm_cvtsi32_si128(ptr[stride]);
    __m128i a2 = _mm_cvtsi32_si128(ptr[stride*2]);
    __m128i a3 = _mm_cvtsi32_si128(ptr[stride*3]);
    a = _mm_unpacklo_epi32(a, a1);
    a = _mm_unpacklo_epi64(a, _mm_unpacklo_epi32(a2, a3));
#endif
    return a;
}

Should these other implementations also include bounds checks?

@m-clare m-clare marked this pull request as draft August 12, 2025 14:25
@m-clare m-clare marked this pull request as ready for review August 12, 2025 20:23
@seberg
Copy link
Member

seberg commented Aug 14, 2025

@seiko2plus would know a bit better. I don't quite understand what UBSANs problem with this is. There isn't an actual pointer overflow here, just some theoretical possibility (i.e. UBSAN wasn't able to infer/figure out that the value is always within range)?

Anyway, let's add the assert ideally with a comment to UBSAN. This code must assume that all data it is asked to load is valid to load!

But I don't think we should define NPY_SIMD_MAXLOAD_STRIDE64, since that would mean there is a limitation of what can be loaded, which isn't the case (and that should even add a runtime check).
(I suppose you could always define it to something that can clearly load everything and use that in the definitions.)

@m-clare
Copy link
Contributor Author

m-clare commented Aug 15, 2025

@seberg Alternatively, if UBSAN is being overzealous here, I can just mark the suppression as one that should remain in the file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants