ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985

seiko2plus · 2020-12-11T20:29:26Z

Ditching the old CPU dispatcher(Arithmetic)

The first patch in a series of pull-requests aims to facilitate the migration
process to our new SIMD interface(NPYV).

It is basically a process that focuses on getting rid of the main umath SIMD source simd.inc,
which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without
changing the base code, which facilitates the review process in order to speed up achieving the nominal target.

In this patch, we have moved the arithmetic operations of real and complex for single/double precision
to the new CPU dispatcher.

NOTE: previously, the SIMD code of AVX2 and AVX512F for single/double precision wasn't dispatched in runtime before

There are two other patches combined with this pull-request improves the umath generator, explained as follows:

Reduce the number of preprocessor CPU runtime dispatcher calls
Add new option 'cfunc_alias', which replaces the suffix of C function name instead of
using ufunc_name. we need this option to get rid of internal pre-processing mapping
of certain inner loops, in order to fit the requirement of the new CPU dispatcher.

… calls

numpy/core/code_generators/generate_umath.py

mattip · 2020-12-13T05:21:32Z

numpy/core/code_generators/generate_umath.py

@@ -280,7 +287,7 @@ def english_upper(s):
    Ufunc(2, 1, Zero,
          docstrings.get('numpy.core.umath.add'),
          'PyUFunc_AdditionTypeResolver',
-          TD(notimes_or_obj, simd=[('avx512f', cmplxvec),('avx2', ints)]),
+          TD(notimes_or_obj, simd=[('avx2', ints)], dispatch=[('loops_arithm_fp', 'fdFD')]),


Just to be clear: this adds fd (cmplxvec is FD). Also: I don't see where 'gG' (long double and long double complex) is taken into account. Am I missing something?

Yes, it dispatches single/double for both real and complex the definitions of these loops moved from loops.c.src and simd.inc to the new dispatch-able source loops_arithm_fp.dispatch.c.src.
NOTE: divide operation for complex single and double precision still in loops.c.src, we will need to vectorize this operation via NPYV and move it to this file later.

I don't see where 'gG' (long double and long double complex) is taken into account. Am I missing something?

loops_arithm_fp.dispatch.c.src only contains inner loops functions with SIMD code the rest operations
remain in loops.c.src.
I don't think we will support SIMD kernels for 80-bit and 128-bit precision, there's no hardware support for it but
maybe we can emulate light operations such as addition and subtract

Ahh, my bad, it is in notimes_or_obj

128-bit precision

AArch64 and Power9 have support for quad precision floats. The astronomy community would like to have them and I think we will see more support coming in modern architectures.

AArch64 and Power9 have scalar support but not SIMD. Power9 users have to build NumPy with CFLAGS -mcpu=power9 or --cpu-baseline=vsx3 in order to get __float128 works, we can dispatch float128 operation in runtime for them if it's necessary.

numpy/core/src/umath/loops_arithm_fp.dispatch.c.src

numpy/core/src/umath/loops_utils.h.src

The first patch in a series of pull-requests aims to facilitate the migration process to our new SIMD interface(NPYV). It is basically a process that focuses on getting rid of the main umath SIMD source `simd.inc`, which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without changing the base code, which facilitates the review process in order to speed up access to the nominal target. In this patch, we have moved the arithmetic operations of real and complex for single/double precision to the new CPU dispatcher. NOTE: previously, the SIMD code of AVX2 and AVX512F for single/double precision wasn't dispatched in runtime before.

mattip

LGTM

charris · 2020-12-18T23:47:06Z

.gitignore

@@ -218,3 +219,4 @@ numpy/core/src/_simd/_simd_data.inc
 numpy/core/src/_simd/_simd_inc.h
 # umath module
 numpy/core/src/umath/loops_unary_fp.dispatch.c
+numpy/core/src/umath/loops_arithm_fp.dispatch.c


I think it is better in general to use full names like "arithmetic" rather than shortened names. Seven character names are long gone :) This can be changed later though.

I will consider it a policy to follow in my upcoming patches.

numpy/core/code_generators/generate_umath.py

charris · 2020-12-18T23:55:50Z

numpy/core/code_generators/generate_umath.py

@@ -1000,6 +1010,7 @@ def make_arrays(funcdict):
    # later
    code1list = []
    code2list = []
+    dispdict  = {}


Maybe dispatch_dict?

charris · 2020-12-19T20:59:16Z

Thanks Sayed.

seiko2plus added 2 commits December 10, 2020 23:53

MAINT, SIMD: reduce the number of preprocessor CPU runtime dispatcher…

7bd6de3

… calls

ENH: add new option 'cfunc_alias' to umath generator

d084917

github-actions bot added the 01 - Enhancement label Dec 11, 2020

seiko2plus force-pushed the ditch_simd_arithmetic branch from 54720f6 to 203a16d Compare December 12, 2020 19:56

mattip reviewed Dec 13, 2020

View reviewed changes

seiko2plus force-pushed the ditch_simd_arithmetic branch from 203a16d to e06ec95 Compare December 14, 2020 01:59

seiko2plus commented Dec 14, 2020

View reviewed changes

numpy/core/src/umath/loops_arithm_fp.dispatch.c.src Show resolved Hide resolved

seiko2plus commented Dec 14, 2020

View reviewed changes

numpy/core/src/umath/loops_utils.h.src Show resolved Hide resolved

seiko2plus force-pushed the ditch_simd_arithmetic branch from e06ec95 to 0985a73 Compare December 14, 2020 02:26

mattip approved these changes Dec 14, 2020

View reviewed changes

mattip mentioned this pull request Dec 14, 2020

Native code order of magnitude slower than translated code on Apple M1 #17989

Closed

charris reviewed Dec 18, 2020

View reviewed changes

numpy/core/code_generators/generate_umath.py Outdated Show resolved Hide resolved

charris reviewed Dec 18, 2020

View reviewed changes

MAINT: Small style fixes.

81bb563

charris merged commit f7bc512 into numpy:master Dec 19, 2020

seiko2plus mentioned this pull request Dec 26, 2020

ENH: libdivide for unsigned integers #18055

Closed

seiko2plus deleted the ditch_simd_arithmetic branch January 9, 2021 16:51

rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985

ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985

Uh oh!

seiko2plus commented Dec 11, 2020 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattip Dec 13, 2020

Uh oh!

seiko2plus Dec 14, 2020 •

edited

Loading

Uh oh!

mattip Dec 14, 2020

Uh oh!

charris Dec 18, 2020

Uh oh!

seiko2plus Dec 19, 2020

Uh oh!

Uh oh!

Uh oh!

mattip left a comment

Uh oh!

charris Dec 18, 2020

Uh oh!

seiko2plus Dec 19, 2020

Uh oh!

Uh oh!

charris Dec 18, 2020

Uh oh!

charris commented Dec 19, 2020

Uh oh!

Uh oh!

Uh oh!

ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985

ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985

Uh oh!

Conversation

seiko2plus commented Dec 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Ditching the old CPU dispatcher(Arithmetic)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mattip Dec 13, 2020

Choose a reason for hiding this comment

Uh oh!

seiko2plus Dec 14, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mattip Dec 14, 2020

Choose a reason for hiding this comment

Uh oh!

charris Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

seiko2plus Dec 19, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

mattip left a comment

Choose a reason for hiding this comment

Uh oh!

charris Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

seiko2plus Dec 19, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

charris Dec 18, 2020

Choose a reason for hiding this comment

Uh oh!

charris commented Dec 19, 2020

Uh oh!

Uh oh!

seiko2plus commented Dec 11, 2020 •

edited

Loading

seiko2plus Dec 14, 2020 •

edited

Loading