BUG: fix possible overlap issues with avx enabled #12398

juliantaylor · 2018-11-15T18:29:19Z

Fix the overlap check for vectors larger than 16 bytes. Only an issue
when avx is enabled at compile time.

The actual fix is in the first few hunks, whether we want to use the macro in the rest of the code is debatable as it introduces a dependency between the macros being used and the code, unlikely be a practical issue though.

Fix the overlap check for vectors larger than 16 bytes. Only an issue when avx is enabled at compile time.

charris · 2018-11-15T19:04:52Z

Looks straight forward to me.

mattip · 2018-11-15T20:18:11Z

LGTM.
Would be nice at some point to enable codepath choice at runtime via cpu id and some high level control, then we could benchmark when it is advantageous to use the different intrinsics, depending on array size or cpu load.

charris · 2018-11-15T20:22:42Z

Let's get this in then. Thanks Julian.

VictorRodriguez · 2018-11-15T20:37:41Z

+1 please merge , i will test it on my distro

regards

juliantaylor · 2018-11-15T22:40:14Z

@mattip thats what #11113 does

VictorRodriguez · 2018-11-15T22:46:35Z

@juliantaylor what commit from #11113 does that? i am a bit lost about that point

mattip · 2018-11-15T22:48:44Z

It seems #11113 is for compile time.
I was suggesting to compile all the loops and choose at runtime which to use based on CPU and some tunable heuristics (including "don't use any of these").

VictorRodriguez · 2018-11-15T22:55:14Z

@mattip you mean like :

if (__builtin_cpu_supports("avx512dq")
if (__builtin_cpu_supports("avx2")

like that?

mattip · 2018-11-15T23:06:51Z

yes, but more like

if (__builtin_cpu_supports("avx512dq") and choose_avx512(args)):
    # choose avx512 inner loop functions
elif (__builtin_cpu_supports("avx2") and choose_avx2(args)):
    # choose avx2 inner loop functions
    ...

in the loop selector function for the appropriate ufuncs

juliantaylor · 2018-11-16T17:59:51Z

no that PR is runtime not compile time the dispatching happens at ufunc initialization time same as it is already done for integers.

charris · 2018-12-07T23:11:07Z

@juliantaylor This breaks testing on my machine. The error is during test collection and is completely uninformative, but here it is

NumPy version 1.16.0.dev0+3952f5c
NumPy relaxed strides checking option: True

=============================================================== ERRORS ================================================================
___________________ ERROR collecting build/testenv/lib64/python2.7/site-packages/numpy/linalg/tests/test_linalg.py ____________________
numpy/linalg/tests/test_linalg.py:329: in <module>
    CASES += _make_strided_cases()
numpy/linalg/tests/test_linalg.py:321: in _make_strided_cases
    for a, a_label in _stride_comb_iter(case.a):
numpy/linalg/tests/test_linalg.py:296: in _stride_comb_iter
    assert_(np.all(xi == x))
E   AssertionError
______________ ERROR collecting build/testenv/lib64/python2.7/site-packages/numpy/matrixlib/tests/test_matrix_linalg.py _______________
numpy/matrixlib/tests/test_matrix_linalg.py:8: in <module>
    from numpy.linalg.tests.test_linalg import (
/home/charris/.local/lib/python2.7/site-packages/_pytest/assertion/rewrite.py:295: in load_module
    six.exec_(co, mod.__dict__)
/home/charris/.local/lib/python2.7/site-packages/six.py:709: in exec_
    exec("""exec _code_ in _globs_, _locs_""")
<string>:1: in <module>
    ???
numpy/linalg/tests/test_linalg.py:329: in <module>
    CASES += _make_strided_cases()
numpy/linalg/tests/test_linalg.py:321: in _make_strided_cases
    for a, a_label in _stride_comb_iter(case.a):
numpy/linalg/tests/test_linalg.py:296: in _stride_comb_iter
    assert_(np.all(xi == x))
E   AssertionError
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 2 errors during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
126 deselected, 2 error in 8.86 seconds

Reverting this commit fixes the problem. I have no idea as to why this would compile but cause failures.
This is with gcc version 8.2.1 20181105 (Red Hat 8.2.1-5) (GCC). lscpu shows

Architecture:        x86_64
...
Model name:          Intel(R) Core(TM) i5-4670K CPU @ 3.40GHz
Stepping:            3
...
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt dtherm ida arat pln pts flush_l1d

I'm sorry I didn't see this earlier. No one else seems to be having problems, so I assume it is something specific to my machine/environment. Note that if I knock out the failing assert the tests run, but there are a lot of errors. How can I debug this?

charris · 2018-12-07T23:50:14Z

I'm guessing one of the parameters is off.

charris · 2018-12-08T00:21:53Z

If I undefine __AVX2__ it works fine.

charris · 2018-12-08T00:33:12Z

If I remove -march=native from CFLAGS it also works. Hmm... I think I would call that a gcc bug. Or maybe there is a problem with __AVX2__.

VictorRodriguez · 2018-12-11T19:21:47Z

@charris , how are you running the tests? I can help to try to replicate the issue
I run this to check sanity :

import numpy
numpy.test("full")

VictorRodriguez · 2018-12-11T20:33:42Z

Confirmed with gcc 8.2.1 20180502
@charris
I don't see how the bug is since i tested this :

#include <stdio.h>

#if defined __AVX512F__
#define VECTOR_SIZE_BYTES 64
#elif defined __AVX2__
#define VECTOR_SIZE_BYTES 32
#else
#define VECTOR_SIZE_BYTES 16
#endif
int main(){
    printf("SIZE_BYTES = %d\n",VECTOR_SIZE_BYTES);
    return 0;
}

#undef VECTOR_SIZE_BYTES

And set the proper value
$ gcc main.c -march=haswell
$ ./a.out
SIZE_BYTES = 32
$ gcc main.c -march=native
$ ./a.out
SIZE_BYTES = 32

Model name:          Intel(R) Core(TM) i7-4700MQ CPU @ 2.40GHz (avx2)

VictorRodriguez · 2018-12-11T21:25:23Z

@charris
isn't the bug in ?
611 - LOOP_BLOCKED(@type@, 32) {
612 + LOOP_BLOCKED(@type@, 2 * VECTOR_SIZE_BYTES) {
despite the fact that this is inside sse2_@kind@_@TYPE@(@type@ * ip, @type@ * op, const npy_intp n) the value might be hardcoded .. maybe
testing ...

charris · 2018-12-13T18:57:59Z

@VictorRodriguez Any luck?

VictorRodriguez · 2018-12-13T19:01:34Z

@charris , almost , I remove this from the patch :
https://hastebin.com/zugexixiba.pl
It does not break immediately, but it fails later on many tests :
368 failed, 6196 passed, 18 skipped, 9 xfailed in 973.77 seconds
I will bisect the patch and find out more .. still debugging

VictorRodriguez · 2018-12-15T16:41:11Z

@charris i post 2 commits:

one to revert the change
https://github.com/VictorRodriguez/numpy/commit/ed326b0560e83fe1af485f29bcce74ee67080fac

other try to fix the issue
https://github.com/VictorRodriguez/numpy/commit/732492b6c9211338e19d1eb2b03054289a043b79
It fix the AssertionError and let the other tests continue however many fails, let me know if you take a try on this patch. I will keep on working on a better fix

charris · 2018-12-15T19:04:05Z

Yeah, 2) is not an option for a fix. I'm inclining towards 1), but by undefining the AVX macros up top until this is figured out. I suspect Julian is on vacation, so we will have to figure this out for ourselves.

juliantaylor · 2018-12-16T01:09:56Z

hm I applied the vector size code to code was not actually adapted to larger vector sizes...
This whole avx mess needs to be cleaned up. I'll try to have a look soon, if I do not get to it soon you could revert this change, the overlap bug is less likely to occur for custom compile options than this.

juliantaylor · 2018-12-16T01:32:20Z

While I once again screwed this up badly we should think about how we deal with this type of code in future.
The expertise is for this is hard to find nowadays and I unfortunately cannot monitor numpy close enough to react in a reasonable timeframe to changes to it.
Maybe we should just drop much this code and let the compiler deal with it as much as possible. The current avx code is trivial to vectorize for compilers. (The boolean and reduction code needs to be checked again as compilers used to have trouble with that)

charris · 2018-12-16T02:17:57Z

@juliantaylor No need to beat yourself up over it, we'll get things fixed up and move on.

Even if we leave AVX to the compiler, we will still need to deal with overlap issues, correct? I am certainly not opposed to the compiler approach, IIRC there has so far been no convincing evidence that explicit code gives an advantage. But there is a time to fix issue. Would simply undefining the AVX macros be a quick fix so I can get a release candidate out for initial testing? Or would that still leave overlap issues to be dealt with?

juliantaylor · 2018-12-16T11:57:48Z

here is the fix
#12555
#12556

VictorRodriguez · 2018-12-17T20:32:11Z

#12555 fix the bug for me , thanks

This uptream patch fix error reported in numpy/numpy#12398 (comment) numpy/linalg/tests/test_linalg.py:296: in _stride_comb_iter assert_(np.all(xi == x)) AssertionError Signed-off-by: Victor Rodriguez <victor.rodriguez.bahena@intel.com>

BUG: fix possible overlap issues with avx enabled

85d5ce2

Fix the overlap check for vectors larger than 16 bytes. Only an issue when avx is enabled at compile time.

juliantaylor mentioned this pull request Nov 15, 2018

AVX for floats? #11113

Closed

charris added 00 - Bug component: numpy._core labels Nov 15, 2018

charris added this to the 1.16.0 release milestone Nov 15, 2018

charris merged commit 3952f5c into numpy:master Nov 15, 2018

juliantaylor deleted the fix-large-vector-overlap branch November 15, 2018 22:40

charris mentioned this pull request Dec 8, 2018

AVX2 related failures. #12507

Closed

Uh oh!

BUG: fix possible overlap issues with avx enabled #12398

BUG: fix possible overlap issues with avx enabled #12398

Uh oh!

Conversation

juliantaylor commented Nov 15, 2018

Uh oh!

charris commented Nov 15, 2018

Uh oh!

mattip commented Nov 15, 2018

Uh oh!

charris commented Nov 15, 2018

Uh oh!

VictorRodriguez commented Nov 15, 2018

Uh oh!

juliantaylor commented Nov 15, 2018

Uh oh!

VictorRodriguez commented Nov 15, 2018

Uh oh!

mattip commented Nov 15, 2018

Uh oh!

VictorRodriguez commented Nov 15, 2018

Uh oh!

mattip commented Nov 15, 2018

Uh oh!

juliantaylor commented Nov 16, 2018

Uh oh!

charris commented Dec 7, 2018

Uh oh!

charris commented Dec 7, 2018

Uh oh!

charris commented Dec 8, 2018

Uh oh!

charris commented Dec 8, 2018

Uh oh!

VictorRodriguez commented Dec 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VictorRodriguez commented Dec 11, 2018 • edited by charris Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

VictorRodriguez commented Dec 11, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

charris commented Dec 13, 2018

Uh oh!

VictorRodriguez commented Dec 13, 2018

Uh oh!

VictorRodriguez commented Dec 15, 2018

Uh oh!

charris commented Dec 15, 2018

Uh oh!

juliantaylor commented Dec 16, 2018

Uh oh!

juliantaylor commented Dec 16, 2018

Uh oh!

charris commented Dec 16, 2018

Uh oh!

juliantaylor commented Dec 16, 2018

Uh oh!

VictorRodriguez commented Dec 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

VictorRodriguez commented Dec 11, 2018 •

edited

Loading

VictorRodriguez commented Dec 11, 2018 •

edited by charris

Loading

VictorRodriguez commented Dec 11, 2018 •

edited

Loading

VictorRodriguez commented Dec 17, 2018 •

edited

Loading