Skip to content

BUG: SIMD Neon undefined behavior - pointer overflow #29549

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

m-clare
Copy link
Contributor

@m-clare m-clare commented Aug 12, 2025

NPY_FINLINE npyv_u32 npyv_loadn_u32(const npy_uint32 *ptr, npy_intp stride)
{
    assert(llabs(stride) <= NPY_SIMD_MAXLOAD_STRIDE32);
    const __m256i steps = _mm256_setr_epi32(0, 1, 2, 3, 4, 5, 6, 7);
    const __m256i idx = _mm256_mullo_epi32(_mm256_set1_epi32((int)stride), steps);
    return _mm256_i32gather_epi32((const int*)ptr, idx, 4);
}

While looking at other architecture options for the same function npyv_loadn_u32, I noticed that sse, lsx, vec do not have a bounds check on the stride, but were not flagged by UBSAN (they all use array indexing instead of pointer arithmetic). For example for sse/memory.h L79:

NPY_FINLINE npyv_s32 npyv_loadn_s32(const npy_int32 *ptr, npy_intp stride)
{
    __m128i a = _mm_cvtsi32_si128(*ptr);
#ifdef NPY_HAVE_SSE41
    a = _mm_insert_epi32(a, ptr[stride],   1);
    a = _mm_insert_epi32(a, ptr[stride*2], 2);
    a = _mm_insert_epi32(a, ptr[stride*3], 3);
#else
    __m128i a1 = _mm_cvtsi32_si128(ptr[stride]);
    __m128i a2 = _mm_cvtsi32_si128(ptr[stride*2]);
    __m128i a3 = _mm_cvtsi32_si128(ptr[stride*3]);
    a = _mm_unpacklo_epi32(a, a1);
    a = _mm_unpacklo_epi64(a, _mm_unpacklo_epi32(a2, a3));
#endif
    return a;
}

Should these other implementations also include bounds checks?

@m-clare m-clare marked this pull request as draft August 12, 2025 14:25
@m-clare m-clare marked this pull request as ready for review August 12, 2025 20:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant