-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH, SIMD: Add new NPYV intrinsics pack(0) #17789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- add bitfield conversion for boolean vectors - add reverse elements of each 64-bit lane - add testing cases
8f57fc7
to
8ec494e
Compare
806968f
to
dcbfe24
Compare
Looks good to me, I will refactor #17102 and give a new benchmark test by using new intrinsics here. |
{ | ||
return _mm256_shuffle_ps(a, a, _MM_SHUFFLE(2, 3, 0, 1)); | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not add reverse f64 api here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this intrinsic only reverse vector elements on each 64-bit width, same as neon vrev64q_*
#define npyv_cvt_b64_f64 _mm256_castpd_si256 | ||
|
||
// convert boolean vector to integer bitfield | ||
NPY_FINLINE npy_uint64 npyv_tobits_b8(npyv_b8 a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The name is a litter confused, how about signmask or movemask?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this intrinsic isn't replaceable for movemask
, since it accepts only boolean vector where the value must be
(1 << lane_size_inbits) - 1// 0xff****
or 0
and thats why we should get a better perfomance on aarch64.
} | ||
#define npyv_rev64_s16 npyv_rev64_u16 | ||
|
||
NPY_FINLINE npyv_u8 npyv_rev64_u8(npyv_u8 a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
npyv_rev64_u8
should be in front of npyv_rev64_u16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, we need to reuse npyv_rev64_u16()
within npyv_rev64_u8()
for emulating when SSSE3
isn't available
I think we should wait till after the 1.20 release is branched off master to put this in. |
@mattip, no problem |
@mattip, Is there any possibility to merge it soon? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks OK from what I understand. Just a minor nit about formatting comments.
*************************************************************************/ | ||
//######################################################################### | ||
//## Defining NPYV intrinsics as module functions | ||
//######################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? This is C code and should use C comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still C99 comment, I had to add a special mark to separate the parent sections of the main template loops to reduce the dispersion. this source file contains two main loops one for defining c functions of module methods and another one to attach it into a static array of PyMethodDef
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand. Is there something special about the parsing here that is different from the other *.c.src
files?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no there's no something special about it, except that we're dealing with a huge number of functions that need to be organized in a proper format.
//#########################################################################
//## A comment separating the main sections
//#########################################################################
/*********************************************************************************************************
** A comment separating the subsections of SIMD operations, such as memory, reorder ..etc
********************************************************************************************************/
// a comment for each group of intriniscs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it, thanks for the explanation.
*************************************************************************/ | ||
//######################################################################### | ||
//## Defining a separate module for each target | ||
//######################################################################### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why change the comment?
Thanks @seiko2plus |
Add new NPYV intrinsics pack(0)
TODO: