Skip to content

BUG: fix possible overlap issues with avx enabled #12398

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 15, 2018

Conversation

juliantaylor
Copy link
Contributor

Fix the overlap check for vectors larger than 16 bytes. Only an issue
when avx is enabled at compile time.

The actual fix is in the first few hunks, whether we want to use the macro in the rest of the code is debatable as it introduces a dependency between the macros being used and the code, unlikely be a practical issue though.

Fix the overlap check for vectors larger than 16 bytes. Only an issue
when avx is enabled at compile time.
@charris
Copy link
Member

charris commented Nov 15, 2018

Looks straight forward to me.

@charris charris added this to the 1.16.0 release milestone Nov 15, 2018
@mattip
Copy link
Member

mattip commented Nov 15, 2018

LGTM.
Would be nice at some point to enable codepath choice at runtime via cpu id and some high level control, then we could benchmark when it is advantageous to use the different intrinsics, depending on array size or cpu load.

@charris charris merged commit 3952f5c into numpy:master Nov 15, 2018
@charris
Copy link
Member

charris commented Nov 15, 2018

Let's get this in then. Thanks Julian.

@VictorRodriguez
Copy link
Contributor

+1 please merge , i will test it on my distro

regards

@juliantaylor
Copy link
Contributor Author

@mattip thats what #11113 does

@juliantaylor juliantaylor deleted the fix-large-vector-overlap branch November 15, 2018 22:40
@VictorRodriguez
Copy link
Contributor

@juliantaylor what commit from #11113 does that? i am a bit lost about that point

@mattip
Copy link
Member

mattip commented Nov 15, 2018

It seems #11113 is for compile time.
I was suggesting to compile all the loops and choose at runtime which to use based on CPU and some tunable heuristics (including "don't use any of these").

@VictorRodriguez
Copy link
Contributor

@mattip you mean like :

if (__builtin_cpu_supports("avx512dq")
if (__builtin_cpu_supports("avx2")

like that?

@mattip
Copy link
Member

mattip commented Nov 15, 2018

yes, but more like

if (__builtin_cpu_supports("avx512dq") and choose_avx512(args)):
    # choose avx512 inner loop functions
elif (__builtin_cpu_supports("avx2") and choose_avx2(args)):
    # choose avx2 inner loop functions
    ...

in the loop selector function for the appropriate ufuncs

@juliantaylor
Copy link
Contributor Author

no that PR is runtime not compile time the dispatching happens at ufunc initialization time same as it is already done for integers.

@charris
Copy link
Member

charris commented Dec 7, 2018

@juliantaylor This breaks testing on my machine. The error is during test collection and is completely uninformative, but here it is

NumPy version 1.16.0.dev0+3952f5c
NumPy relaxed strides checking option: True

=============================================================== ERRORS ================================================================
___________________ ERROR collecting build/testenv/lib64/python2.7/site-packages/numpy/linalg/tests/test_linalg.py ____________________
numpy/linalg/tests/test_linalg.py:329: in <module>
    CASES += _make_strided_cases()
numpy/linalg/tests/test_linalg.py:321: in _make_strided_cases
    for a, a_label in _stride_comb_iter(case.a):
numpy/linalg/tests/test_linalg.py:296: in _stride_comb_iter
    assert_(np.all(xi == x))
E   AssertionError
______________ ERROR collecting build/testenv/lib64/python2.7/site-packages/numpy/matrixlib/tests/test_matrix_linalg.py _______________
numpy/matrixlib/tests/test_matrix_linalg.py:8: in <module>
    from numpy.linalg.tests.test_linalg import (
/home/charris/.local/lib/python2.7/site-packages/_pytest/assertion/rewrite.py:295: in load_module
    six.exec_(co, mod.__dict__)
/home/charris/.local/lib/python2.7/site-packages/six.py:709: in exec_
    exec("""exec _code_ in _globs_, _locs_""")
<string>:1: in <module>
    ???
numpy/linalg/tests/test_linalg.py:329: in <module>
    CASES += _make_strided_cases()
numpy/linalg/tests/test_linalg.py:321: in _make_strided_cases
    for a, a_label in _stride_comb_iter(case.a):
numpy/linalg/tests/test_linalg.py:296: in _stride_comb_iter
    assert_(np.all(xi == x))
E   AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
126 deselected, 2 error in 8.86 seconds

Reverting this commit fixes the problem. I have no idea as to why this would compile but cause failures.
This is with gcc version 8.2.1 20181105 (Red Hat 8.2.1-5) (GCC). lscpu shows

Architecture:        x86_64
...
Model name:          Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Stepping:            3
...
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d

I'm sorry I didn't see this earlier. No one else seems to be having problems, so I assume it is something specific to my machine/environment. Note that if I knock out the failing assert the tests run, but there are a lot of errors. How can I debug this?

@charris
Copy link
Member

charris commented Dec 7, 2018

I'm guessing one of the parameters is off.

@charris
Copy link
Member

charris commented Dec 8, 2018

If I undefine __AVX2__ it works fine.

@charris
Copy link
Member

charris commented Dec 8, 2018

If I remove -march=native from CFLAGS it also works. Hmm... I think I would call that a gcc bug. Or maybe there is a problem with __AVX2__.

@charris charris mentioned this pull request Dec 8, 2018
@VictorRodriguez
Copy link
Contributor

VictorRodriguez commented Dec 11, 2018

@charris , how are you running the tests? I can help to try to replicate the issue
I run this to check sanity :

import numpy
numpy.test("full")

@VictorRodriguez
Copy link
Contributor

VictorRodriguez commented Dec 11, 2018

Confirmed with gcc 8.2.1 20180502
@charris
I don't see how the bug is since i tested this :

#include <stdio.h>

#if defined __AVX512F__
#define VECTOR_SIZE_BYTES 64
#elif defined __AVX2__
#define VECTOR_SIZE_BYTES 32
#else
#define VECTOR_SIZE_BYTES 16
#endif
int main(){
    printf("SIZE_BYTES = %d\n",VECTOR_SIZE_BYTES);
    return 0;
}

#undef VECTOR_SIZE_BYTES

And set the proper value
$ gcc main.c -march=haswell
$ ./a.out
SIZE_BYTES = 32
$ gcc main.c -march=native
$ ./a.out
SIZE_BYTES = 32

Model name:          Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz (avx2)

@VictorRodriguez
Copy link
Contributor

VictorRodriguez commented Dec 11, 2018

@charris
isn't the bug in ?
611 - LOOP_BLOCKED(@type@, 32) {
612 + LOOP_BLOCKED(@type@, 2 * VECTOR_SIZE_BYTES) {
despite the fact that this is inside sse2_@kind@_@TYPE@(@type@ * ip, @type@ * op, const npy_intp n) the value might be hardcoded .. maybe
testing ...

@charris
Copy link
Member

charris commented Dec 13, 2018

@VictorRodriguez Any luck?

@VictorRodriguez
Copy link
Contributor

@charris , almost , I remove this from the patch :
https://hastebin.com/zugexixiba.pl
It does not break immediately, but it fails later on many tests :
368 failed, 6196 passed, 18 skipped, 9 xfailed in 973.77 seconds
I will bisect the patch and find out more .. still debugging

@VictorRodriguez
Copy link
Contributor

@charris i post 2 commits:

one to revert the change
https://github.com/VictorRodriguez/numpy/commit/ed326b0560e83fe1af485f29bcce74ee67080fac

other try to fix the issue
https://github.com/VictorRodriguez/numpy/commit/732492b6c9211338e19d1eb2b03054289a043b79
It fix the AssertionError and let the other tests continue however many fails, let me know if you take a try on this patch. I will keep on working on a better fix

@charris
Copy link
Member

charris commented Dec 15, 2018

Yeah, 2) is not an option for a fix. I'm inclining towards 1), but by undefining the AVX macros up top until this is figured out. I suspect Julian is on vacation, so we will have to figure this out for ourselves.

@juliantaylor
Copy link
Contributor Author

hm I applied the vector size code to code was not actually adapted to larger vector sizes...
This whole avx mess needs to be cleaned up. I'll try to have a look soon, if I do not get to it soon you could revert this change, the overlap bug is less likely to occur for custom compile options than this.

@juliantaylor
Copy link
Contributor Author

While I once again screwed this up badly we should think about how we deal with this type of code in future.
The expertise is for this is hard to find nowadays and I unfortunately cannot monitor numpy close enough to react in a reasonable timeframe to changes to it.
Maybe we should just drop much this code and let the compiler deal with it as much as possible. The current avx code is trivial to vectorize for compilers. (The boolean and reduction code needs to be checked again as compilers used to have trouble with that)

@charris
Copy link
Member

charris commented Dec 16, 2018

@juliantaylor No need to beat yourself up over it, we'll get things fixed up and move on.

Even if we leave AVX to the compiler, we will still need to deal with overlap issues, correct? I am certainly not opposed to the compiler approach, IIRC there has so far been no convincing evidence that explicit code gives an advantage. But there is a time to fix issue. Would simply undefining the AVX macros be a quick fix so I can get a release candidate out for initial testing? Or would that still leave overlap issues to be dealt with?

@juliantaylor
Copy link
Contributor Author

here is the fix
#12555
#12556

@VictorRodriguez
Copy link
Contributor

VictorRodriguez commented Dec 17, 2018

#12555 fix the bug for me , thanks

clrpackages pushed a commit to clearlinux-pkgs/numpy that referenced this pull request Dec 24, 2018
This uptream patch fix error reported in

numpy/numpy#12398 (comment)

    numpy/linalg/tests/test_linalg.py:296: in _stride_comb_iter
        assert_(np.all(xi == x))
       AssertionError

Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants