-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
54720f6
to
203a16d
Compare
@@ -280,7 +287,7 @@ def english_upper(s): | |||
Ufunc(2, 1, Zero, | |||
docstrings.get('numpy.core.umath.add'), | |||
'PyUFunc_AdditionTypeResolver', | |||
TD(notimes_or_obj, simd=[('avx512f', cmplxvec),('avx2', ints)]), | |||
TD(notimes_or_obj, simd=[('avx2', ints)], dispatch=[('loops_arithm_fp', 'fdFD')]), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to be clear: this adds fd
(cmplxvec
is FD
). Also: I don't see where 'gG' (long double and long double complex) is taken into account. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it dispatches single/double for both real and complex the definitions of these loops moved from loops.c.src
and simd.inc
to the new dispatch-able source loops_arithm_fp.dispatch.c.src
.
NOTE: divide operation for complex single and double precision still in loops.c.src
, we will need to vectorize this operation via NPYV and move it to this file later.
I don't see where 'gG' (long double and long double complex) is taken into account. Am I missing something?
loops_arithm_fp.dispatch.c.src
only contains inner loops functions with SIMD code the rest operations
remain in loops.c.src
.
I don't think we will support SIMD kernels for 80-bit
and 128-bit
precision, there's no hardware support for it but
maybe we can emulate light operations such as addition and subtract
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, my bad, it is in notimes_or_obj
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
128-bit precision
AArch64 and Power9 have support for quad precision floats. The astronomy community would like to have them and I think we will see more support coming in modern architectures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AArch64 and Power9 have scalar support but not SIMD. Power9 users have to build NumPy with CFLAGS -mcpu=power9
or --cpu-baseline=vsx3
in order to get __float128
works, we can dispatch float128 operation in runtime for them if it's necessary.
203a16d
to
e06ec95
Compare
The first patch in a series of pull-requests aims to facilitate the migration process to our new SIMD interface(NPYV). It is basically a process that focuses on getting rid of the main umath SIMD source `simd.inc`, which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without changing the base code, which facilitates the review process in order to speed up access to the nominal target. In this patch, we have moved the arithmetic operations of real and complex for single/double precision to the new CPU dispatcher. NOTE: previously, the SIMD code of AVX2 and AVX512F for single/double precision wasn't dispatched in runtime before.
e06ec95
to
0985a73
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -218,3 +219,4 @@ numpy/core/src/_simd/_simd_data.inc | |||
numpy/core/src/_simd/_simd_inc.h | |||
# umath module | |||
numpy/core/src/umath/loops_unary_fp.dispatch.c | |||
numpy/core/src/umath/loops_arithm_fp.dispatch.c |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is better in general to use full names like "arithmetic" rather than shortened names. Seven character names are long gone :) This can be changed later though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will consider it a policy to follow in my upcoming patches.
@@ -1000,6 +1010,7 @@ def make_arrays(funcdict): | |||
# later | |||
code1list = [] | |||
code2list = [] | |||
dispdict = {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe dispatch_dict
?
Thanks Sayed. |
Ditching the old CPU dispatcher(Arithmetic)
The first patch in a series of pull-requests aims to facilitate the migration
process to our new SIMD interface(NPYV).
It is basically a process that focuses on getting rid of the main umath SIMD source
simd.inc
,which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without
changing the base code, which facilitates the review process in order to speed up achieving the nominal target.
In this patch, we have moved the arithmetic operations of real and complex for single/double precision
to the new CPU dispatcher.
NOTE: previously, the SIMD code of AVX2 and AVX512F for single/double precision wasn't dispatched in runtime before
There are two other patches combined with this pull-request improves the umath generator, explained as follows:
using ufunc_name. we need this option to get rid of internal pre-processing mapping
of certain inner loops, in order to fit the requirement of the new CPU dispatcher.