BUG: Remove check requiring natural alignment of float/double to run AVX code #15615

r-devulap · 2020-02-20T05:17:33Z

I am not entirely sure what is causing the CI failure in #15610. But the test failure seems to be because of a small ULP error (Max relative difference: 1.1176388e-16
which should be < 1ULP). This patch uses assert_array_max_ulp instead of assert_equal which should hopefully fix the test failure.

mattip · 2020-02-20T06:15:04Z

What is this part of the test meant to check? The stated purpose is to check the gather properties of the AVX loops. So I think this PR is avoiding the problem: it should only check strides that will end up using the AVX semantics. If you are hitting the glibc implementation, I think you are not testing what you want to test. Do you agree this strided part of the test is somehow escaping the AVX implementation? We will eventually get a bug report on some obscure platform that needs 2 or 3 ULP, and will spend time searching for why they report tests failing, but we cannot reproduce.

This points out the problem with having architecture-specific code: we need to be very careful with our tests. NumPy supports many obscure platforms and antiquated systems, and the developers on those platforms want to be able to run numpy.test() secure in the knowledge that it should pass. I would prefer to spend the time now to hone the tests rather than later get issues that are very hard to reproduce.

mattip · 2020-02-20T07:16:40Z

The gather part of the code does not depend on the ufunc itself, so you really don't need to check all the ufuncs here. Perhaps a better way to test this part of the code would be to add another test ufunc that uses a _mm256 copy intrinsic, and the non-AVX passthrough (the "glibc version") does nothing. Then set the output to -1, and the input to range(n). If the output remains -1, the AVX loop was not called. Otherwise, you can check that the output is range(n) (the order must be correct). We have _umath_tests.c.src for things like this. Such a test would also provide a way to check that AVX code is being called, so we can check other parts of the ufunc machinery: loop function substitution, compilation issues and more.

Edit: be more specific in the description

r-devulap · 2020-02-20T18:03:15Z

For reciprocal, no matter what the value of stride jj is (negative. positive or zero), it is meant to follow the AVX route via the gather instruction which I can confirm it does on an SkylakeX with AVX-512 and Skylake client with AVX2/FMA. The test is meant to cover all these scenarios with strides = np.array([-4,-3,-2,-1,1,2,3,4])

But I think I might have a clue as to why IS_OUTPUT_BLOCKABLE_UNARY fails for some strides as you say in #15610 (comment). One of the condition for that macro is:

(npy_is_aligned(args[0], esize) && npy_is_aligned(args[1], esize)) which basically requires that doubles should be 8 byte aligned. This could likely to be the culprit. gcc gaurantees 8-byte alignment only on x86_64 and not on 32-bit systems (see -malign-double section of https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). This also can explain why the failure is only intermittent (sometime its 8 byte aligned and other times its not).

I also realized this condition is completely unnecessary. AVX code uses unaligned loads and does not care what the alignment is. So, perhaps I could get rid of this condition and this could be fixed?

mattip · 2020-02-20T19:50:46Z

Sounds reasonable for a fix, but we still should have some kind of test that the avx path is being followed

r-devulap · 2020-02-20T21:33:47Z

but we still should have some kind of test that the avx path is being followed

Doesn't this #15558 exactly address this problem once Universal Intrinsics patch is merged?

In a x86-32 bit system, doubles need not be naturally aligned to 8 Byte boundary (see: -malign-double section of https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html). Having this check meant it ran different code paths (AVX v/s scalar) depending on the alignment of data which leads to different results and test failing intermittently. AVX code uses un-aligned loads and this check is unnecessary to begin with.

r-devulap · 2020-02-20T22:14:55Z

Phew! This was a fun exercise, but I think I am reasonably certain this should fix the failing test. The fact that kept bugging me the most was that the vector and scalar code gave different answers. Reciprocal is ultimately just a divide (1/arr), so the vector divide instruction vdivpd and scalar divide instruction divsd should provide the exact same answer (0 ULP) and there is no way the answers can mis-match.

But turns out for some odd reason, gcc (GCC) 7.3.1 20180303 (Red Hat 7.3.1-5) in the manylinux2010 docker decides to use the x87 fdivl instruction (extended precision) instead of the divsd (double precision) and this doesn't happen with the gcc on Ubuntu (which uses the divsd instruction like other sane people). So that explains the output mismatch and I can sleep peacefully again :)

mattip · 2020-02-21T07:04:18Z

Thanks @r-devulap

mattip · 2020-02-21T08:50:14Z

MacPython/numpy-wheels#73 is still failing

r-devulap mentioned this pull request Feb 20, 2020

TST: mask DeprecationWarning in xfailed test #15610

Merged

charris added 00 - Bug 05 - Testing component: numpy._core labels Feb 20, 2020

r-devulap force-pushed the reciprocal-avx-CI-fail branch from 7221c45 to 519f7ac Compare February 20, 2020 21:42

r-devulap changed the title ~~TST: Use assert_array_max_ulp instead of assert_equal for strided reciprocal test~~ BUG: Remove check requiring natural alignment of float/double to run AVX code Feb 20, 2020

r-devulap force-pushed the reciprocal-avx-CI-fail branch from 519f7ac to 3253ab7 Compare February 20, 2020 21:48

r-devulap mentioned this pull request Feb 20, 2020

BUG: Fix bug in AVX-512F np.maximum and np.minimum #15612

Merged

mattip merged commit 9b1acd4 into numpy:master Feb 21, 2020

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Remove check requiring natural alignment of float/double to run AVX code #15615

BUG: Remove check requiring natural alignment of float/double to run AVX code #15615

r-devulap commented Feb 20, 2020

mattip commented Feb 20, 2020

mattip commented Feb 20, 2020 •

edited

Loading

r-devulap commented Feb 20, 2020

mattip commented Feb 20, 2020

r-devulap commented Feb 20, 2020

r-devulap commented Feb 20, 2020

mattip commented Feb 21, 2020

mattip commented Feb 21, 2020

BUG: Remove check requiring natural alignment of float/double to run AVX code #15615

BUG: Remove check requiring natural alignment of float/double to run AVX code #15615

Conversation

r-devulap commented Feb 20, 2020

mattip commented Feb 20, 2020

mattip commented Feb 20, 2020 • edited Loading

r-devulap commented Feb 20, 2020

mattip commented Feb 20, 2020

r-devulap commented Feb 20, 2020

r-devulap commented Feb 20, 2020

mattip commented Feb 21, 2020

mattip commented Feb 21, 2020

mattip commented Feb 20, 2020 •

edited

Loading