Skip to content

ENH, SIMD: Ditching the old CPU dispatcher(Arithmetic) #17985

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Dec 19, 2020

Conversation

seiko2plus
Copy link
Member

@seiko2plus seiko2plus commented Dec 11, 2020

Ditching the old CPU dispatcher(Arithmetic)

The first patch in a series of pull-requests aims to facilitate the migration
process to our new SIMD interface(NPYV).

It is basically a process that focuses on getting rid of the main umath SIMD source simd.inc,
which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without
changing the base code, which facilitates the review process in order to speed up achieving the nominal target.

In this patch, we have moved the arithmetic operations of real and complex for single/double precision
to the new CPU dispatcher.

NOTE: previously, the SIMD code of AVX2 and AVX512F for single/double precision wasn't dispatched in runtime before

There are two other patches combined with this pull-request improves the umath generator, explained as follows:

  • Reduce the number of preprocessor CPU runtime dispatcher calls
  • Add new option 'cfunc_alias', which replaces the suffix of C function name instead of
    using ufunc_name. we need this option to get rid of internal pre-processing mapping
    of certain inner loops, in order to fit the requirement of the new CPU dispatcher.

@@ -280,7 +287,7 @@ def english_upper(s):
Ufunc(2, 1, Zero,
docstrings.get('numpy.core.umath.add'),
'PyUFunc_AdditionTypeResolver',
TD(notimes_or_obj, simd=[('avx512f', cmplxvec),('avx2', ints)]),
TD(notimes_or_obj, simd=[('avx2', ints)], dispatch=[('loops_arithm_fp', 'fdFD')]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be clear: this adds fd (cmplxvec is FD). Also: I don't see where 'gG' (long double and long double complex) is taken into account. Am I missing something?

Copy link
Member Author

@seiko2plus seiko2plus Dec 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it dispatches single/double for both real and complex the definitions of these loops moved from loops.c.src and simd.inc to the new dispatch-able source loops_arithm_fp.dispatch.c.src.
NOTE: divide operation for complex single and double precision still in loops.c.src, we will need to vectorize this operation via NPYV and move it to this file later.

I don't see where 'gG' (long double and long double complex) is taken into account. Am I missing something?

loops_arithm_fp.dispatch.c.src only contains inner loops functions with SIMD code the rest operations
remain in loops.c.src.
I don't think we will support SIMD kernels for 80-bit and 128-bit precision, there's no hardware support for it but
maybe we can emulate light operations such as addition and subtract

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, my bad, it is in notimes_or_obj

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

128-bit precision

AArch64 and Power9 have support for quad precision floats. The astronomy community would like to have them and I think we will see more support coming in modern architectures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AArch64 and Power9 have scalar support but not SIMD. Power9 users have to build NumPy with CFLAGS -mcpu=power9 or --cpu-baseline=vsx3 in order to get __float128 works, we can dispatch float128 operation in runtime for them if it's necessary.

@seiko2plus seiko2plus force-pushed the ditch_simd_arithmetic branch from 203a16d to e06ec95 Compare December 14, 2020 01:59
   The first patch in a series of pull-requests aims to facilitate the migration
   process to our new SIMD interface(NPYV).

   It is basically a process that focuses on getting rid of the main umath SIMD source `simd.inc`,
   which contains almost all SIMD kernels, by splitting it into several dispatch-able sources without
   changing the base code, which facilitates the review process in order to speed up access to the nominal target.

   In this patch, we have moved the arithmetic operations of real and complex for single/double precision
   to the new CPU dispatcher.

   NOTE: previously, the SIMD code of AVX2 and AVX512F for single/double precision wasn't dispatched in runtime before.
@seiko2plus seiko2plus force-pushed the ditch_simd_arithmetic branch from e06ec95 to 0985a73 Compare December 14, 2020 02:26
Copy link
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@@ -218,3 +219,4 @@ numpy/core/src/_simd/_simd_data.inc
numpy/core/src/_simd/_simd_inc.h
# umath module
numpy/core/src/umath/loops_unary_fp.dispatch.c
numpy/core/src/umath/loops_arithm_fp.dispatch.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better in general to use full names like "arithmetic" rather than shortened names. Seven character names are long gone :) This can be changed later though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will consider it a policy to follow in my upcoming patches.

@@ -1000,6 +1010,7 @@ def make_arrays(funcdict):
# later
code1list = []
code2list = []
dispdict = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe dispatch_dict?

@charris charris merged commit f7bc512 into numpy:master Dec 19, 2020
@charris
Copy link
Member

charris commented Dec 19, 2020

Thanks Sayed.

@seiko2plus seiko2plus deleted the ditch_simd_arithmetic branch January 9, 2021 16:51
@rgommers rgommers added the component: SIMD Issues in SIMD (fast instruction sets) code or machinery label Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement component: SIMD Issues in SIMD (fast instruction sets) code or machinery
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants