-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: improve the speed of array conversions using AVX2 if available #21123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e79f312
to
872438a
Compare
Marking for triage, but generally interested in input: Some unsafe casts (e.g. out-of-bound float to int) are undefined behaviour in C, but in practice seems mostly well defined.
does not always match:
Rather, the value is out of bound, and the minimum If we are unsure about changing this, an alternative could be to only do this for safe casts. Or we decide that this should be OK, and add a release note. |
} | ||
else { | ||
return &_aligned_cast_@name1@_to_@name2@; | ||
return &_aligned_cast_@name1@_to_@name2@@isa@; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Meant to do a review, but then posted instead:
IIRC this branch is never used (NPY_USE_UNALIGNED_ACCESS
is always 0 here) and I don't think vectorization is OK if it was used. So I would either not do this, or just delete the whole # if
block: It doesn't really add a whole lot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am unsure of the meaning of NPY_USE_UNALIGNED_ACCESS, but if we can remove it here since it is always set to 0, why not removing it from the whole file and possibly the whole code ? It would make the code a bit more clean/readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the comment of this macro, vectorization should be disabled at a build level. AFAIK, telling the compiler that AVX can be used does not change anything. Using O3 causes GCC to auto-vectorize the code as opposed to O2 so far but the new versions of GCC (starting from GCC 12) should now enable the auto-vectorization even in O2 which is the default optimization level for Numpy so far. Thus, this change should not cause more harm than currently (but I think the code path enabled by NPY_USE_UNALIGNED_ACCESS
is certainly arlready harmful). What do you think about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to look closer, but I think we should just delete this code (if you agree with that).
We discussed this a bit today in the meeting. And the consensus was that we should probably give this a shot (modulo fixing the tests of course). However, we probably need to create and prioritize an issue to give floating point warnings reliably, because it seems like +this would give a warning (on most systems). That is independent of this, but this PR gives another good reason. It also would be good to add a very brief "compatibility" release note for it, just to be on the safe side. |
Xref gh-21437 which would add the floating point error checking that might be good to have here. |
@zephyr111 the PR to give floating point warnings/errors that occur during casts is now merged, so this is unblocked. |
I am confused about the current state of the runtime checks and what Numpy should do (ie. what we want). Indeed, some bad conversions are now detected which is great but not all of them and I do not understand the logic now how the detection work for some values. For example, the following code detect most issues but not all on my machine with the main branch of Numpy: val = np.array([123_456_789_101_112], dtype=np.int64).astype(np.float32)
print('np.int8'); val.astype(np.int8) # detected overflow
print('np.uint8'); val.astype(np.uint8) # detected overflow
print('np.int16'); val.astype(np.int16) # detected overflow
print('np.uint16'); val.astype(np.uint16) # detected overflow
print('np.int32'); val.astype(np.int32) # detected overflow
print('np.uint32'); val.astype(np.uint32) # overflow not detected ! <----------------
print('np.int64'); val.astype(np.int64) # no overflow
print('np.uint64'); val.astype(np.uint64) # no overflow
val = np.array([np.nan])
print('np.int8'); val.astype(np.int8) # detected issue
print('np.uint8'); val.astype(np.uint8) # detected issue
print('np.int16'); val.astype(np.int16) # detected issue
print('np.uint16'); val.astype(np.uint16) # detected issue
print('np.int32'); val.astype(np.int32) # detected issue
print('np.uint32'); val.astype(np.uint32) # detected issue
print('np.int64'); val.astype(np.int64) # detected issue
print('np.uint64'); val.astype(np.uint64) # detected issue It looks like a similar issue occurs on the test machine, but with 16-bit integers (I cannot reproduce this on my machine). In fact I cannot get the same outcome on my machine on the failing tests (certainly due to the undefined behavior). The thing is the conversion function appears not to have been modified so I wonder how the detection is even possible without adding checks in the conversion loop. I do not understand how #21437 succeed to do that. If we want to detect all cases, AFAIK we need to add a check in the conversion loop but this will certainly be expensive and break the automatic vectorization. Manual vectorization with check is possible but this is a significant work (note that the check is not so straightforward, see this). Finally, the tests complain about the warning |
My guess was that we will see either floating point error or "correct" integer overflow/modulo results and no real way to get better than that (unless we add manual checks). The point being, that if the cast is done as say |
@seiko2plus just if you are looking for a smaller easy project. Wrapping this up (or something similar), could be very beneficial probably. |
Hello,
This PR is meant to improve the speed of array conversions using AVX2 if available based on the discussion in #21001 (the original PR has been split in two parts).