-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: define "uint-alignment", fixes complex64 alignment #6377
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1382,7 +1372,7 @@ mapiter_trivial_@name@(PyArrayObject *self, PyArrayObject *ind, | |||
|
|||
npy_intp itersize; | |||
|
|||
int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result); | |||
int is_aligned = IsUintAligned(self) && IsUintAligned(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IsUintAligned
needs a declaration.
Looks like |
Right now I'm looking into the python3 assertion failures which seem more serious. |
I'll have to finish later. But this is what fails:
My current guess is that there is a bug in I think this only turned up now because I added |
is there anything different to my branch besides adding raw_* |
@juliantaylor I renamed all the uintaligned for now, but I'd like to keep the change in the final version, since I think it will be confusing that 'aligned' does not actually mean aligned. |
87186c2
to
3284052
Compare
OK, I think I fixed the bug in @juliantaylor It's largely the same, but I reorganizes the The last 4 (unnumbered) commits are additional bugfixes/cleanups. |
3c4661f
to
7869879
Compare
I added a few more changes I missed before (last commit). Nditer code needed "uint" alignment too. Also, I got rid of |
doc/source/dev/alignment.rst
Outdated
determines offsets automatically. In that case, ``align=True`` pads the | ||
structure so that each field is aligned in memory, sets ``dtype.alignment`` | ||
to be the largest of the field alignments, and sets ``dtype.itemsize`` to | ||
the smallest posible multiple of this alignment. This is what C-structs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(it also sets dtype.isalignedstruct
)
☔ The latest upstream changes (presumably #7134) made this pull request unmergeable. Please resolve the merge conflicts. |
☔ The latest upstream changes (presumably #7481) made this pull request unmergeable. Please resolve the merge conflicts. |
7500aa9
to
475e4ef
Compare
Rebased and simplified. In summary:
I've removed the massive variable renaming I used to have of "aligned" -> "uintaligned" in the copy code paths. We just need to remember that in those files "aligned" refers to uint alignment, not true alignment. |
7e69574
to
48c389c
Compare
Does this affect pickles, npz files, etc.? I'm mostly concerned about data stored with the old alignment becoming invalid. Don't know if that is possible or not, I'm guessing that the stored offsets will take care of that and that if it were a problem cross platform compatibility would already be broken. |
That's an important thought! But I don't think it's a problem: I don't think it affects npz file loading because those files store the dtype using the pep3118 data-syntax ( As a sidenote, |
numpy/core/src/common/array_assign.c
Outdated
@@ -84,14 +84,27 @@ broadcast_error: { | |||
|
|||
/* See array_assign.h for parameter documentation */ | |||
NPY_NO_EXPORT int | |||
raw_array_is_aligned(int ndim, char *data, npy_intp *strides, int alignment) | |||
raw_array_is_aligned(int ndim, npy_intp *shape, | |||
char *data, npy_intp *strides, int alignment) | |||
{ | |||
if (alignment > 1) { | |||
npy_intp align_check = (npy_intp)data; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe npy_uintp
. I'm pretty sure both work here for twos complement integers, but I needed to think about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that it is later cast to npy_uintp
in npy_is_aligned
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only bitwise operations are involved, so it should be exactly the same.
numpy/core/src/common/array_assign.c
Outdated
/* skip dim == 1 as it is not required to have stride 0 */ | ||
if (shape[i] > 1) { | ||
/* if shape[i] == 1, the stride is never used */ | ||
align_check |= strides[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there is an assumption of multiples of powers of two, together with twos complement arithmetic, built in here. AFAICT, there are no false positive, but perhaps some false negatives. @juliantaylor @seberg Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any arithmetic here, only bitwise OR operations. Whether the integer is considered signed or unsigned doesn't come into play.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well the bitwise or is used together with arrhythmic stuff later since it is used in npy_is_aligned
. Did not think about it long, but I think you are right Chuck. I guess npy_is_aligned
might explain it a bit more?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if twos complement has anything to do with it though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"arrythmic"? :-)
More seriously, I don't know what you are bothering about. First data
is cast to npy_intp
(this is an exact bitcast), some bitwise operations are done on the signed integer (they are agnostic wrt signedness, they just operate on individual bits), then the result is cast back to void*
(this is an exact bitcast), then it is finally cast to npy_uintp
(again an exact bitcast).
There is zero reason to believe that the intermediate cast to npy_intp
is any potential cause for trouble.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, if this errs too much on the safe side (no idea if it does). I guess placing the npy_is_aligned
check into the loop here instead should work and probably not be a lot of overhead. (Plus this is the flags calculation, right? so it doesn't happen very often anyway). Or am I missing something? Plus it might be a bit less mind boggling.
EDIT: Frankly, nvm, I bet for all practical purposes, this just always works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it has been there a long time. I think it basically gives the largest power of two that divides the GCD. For non-powers of two I'm not sure how well it works. But we have lived with it for a long time. Might be worth a note, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some googling suggests that the C standard says that alignments must be powers of two. (I wasn't able to find the exact quote though). In that case, the special casing in npy_is_aligned
is unnecessary, and we can just assume it is a power of two.
I'll add a note in any case, I think this is the 3rd time now that I've puzzled out what this code does.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was planning to sleep on it :) I think when all is done, it will be OK for its intended application, but I want to understand it...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right that this assumes a twos-complement representation of signed integers, even though the C standard explicitly allows others like ones-complement. I'm not sure we care, because in practice all modern computers use two's complement, and because all this pointer bit-twiddling is C-implementation-dependent anyway.
Here's my understanding: We are trying to compute whether (data + n*stride)%alignment == 0
for all integers n
. First, assuming alignment is a power of two, and assuming twos-complement representation, simplify the %alignment == 0
to checking if the lowest log2(alignment) bits are 0. Next, use the fact that in twos complement representation, if x
low order bits of data
and stride
are 0, then the x
lower bits of data + n*stride
must also be 0 for all n
because of how binary addition works. Neither simplification is valid for negative values in ones-complement representation.
Casting the intp stride
to uintp` and using uintp would avoid the signed twos-complement assumption, and the algorithm still works.
However, the representation/cast of data
from pointer to uint is implementation defined too, so we are violating the standard anyway anytime we use bitwise ops on pointer values. (only arithmetic is allowed, I think). I don't see any alternative though, as far as I see there is no totally standard-compliant way to test alignment.
@@ -1385,7 +1375,7 @@ mapiter_trivial_@name@(PyArrayObject *self, PyArrayObject *ind, | |||
|
|||
npy_intp itersize; | |||
|
|||
int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do PyArray_ISALIGNED
and IsUintAligned
differ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The former checks "true" alignment, the latter "uint" alignment.
We need uint alignment for the copy operations about 100 lines down.
datatype is implemented as ``struct { float real, imag; }``. This has "true" | ||
alignment of 4 and "uint" alignment of 8 (equal to the true alignment of | ||
``uint64``). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So there are some structures that have no uint
alignment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's one way to see it. Or, in pratice, those structs get assigned a uint
alignment of 0.
The "uint" alignments are for use in the strided copy code, which only special-cases the 1,2,4,8, and 16 byte sizes. All other cases return a "uint" alignment of 0, which triggers memmove
to be used instead.
I'll correct/update the docs to better describe what happens for non-power-of-two sized types.
doc/source/dev/alignment.rst
Outdated
"Uint" alignment depends on the size of a datatype. It is defined to be the | ||
"True alignment" of the uint of equivalent size to the datatype of interest, | ||
or undefined/unaligned if the size is not a power of two. If the itemsize of an | ||
array is not a power of two the array can never be uint aligned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if it's a power of two greater than sizeof(uintmax_t)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll have to rephrase, that's an inaccurate sentence. It also doesn't describe the fact that 16 bytes types have a uint alignment of 8, because that's what the copy code wants. Maybe:
"Uint" alignment depends on the size of a datatype. It is defined to be the
"True alignment" of the uint used in numpy's copy-code to copy the datatype,
or undefined/unaligned if there is no equivalent uint. Currently numpy uses
uint8, uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes
respectively, and all other sized datatypes cannot be uint aligned.
doc/source/dev/alignment.rst
Outdated
|
||
* The ``dtype.alignment`` attribute (``descr->alignment`` in C). This is meant | ||
to reflect the "true alignment" of the type. It has arch-dependent default | ||
values for non-flexible types, is equal to 1 for flexible types (including |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Counter-example:
>>> np.dtype('U1').alignment
4
>>> np.issubdtype('U1', np.flexible)
True
doc/source/dev/alignment.rst
Outdated
* The ``align`` keyword of the dtype constructor, which only affects structured | ||
arrays. If the structure's field offsets are not manually provided numpy | ||
determines offsets automatically. In that case, ``align=True`` pads the | ||
structure so that each field is aligned in memory and sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't you need to specify which meaning of "align" you are using here?
80c5960
to
8ee2c91
Compare
Implements IsAligned, IsUintAligned, npy_uint_alignment
Thanks Allan. |
This PR fixes alignment checks along the lines of #5365 and this comment.
This fixes the bug that
complex64
(andcomplex128
) has incorrect alignment, and should be 4 instead of 8 on most systems. This has caused further bugs such as unnecessary buffer allocation, (previously) bus errors, and more work for people adding new architectures.The problem is that numpy really needs two different types of alignment check:
memcpy(dst, src, N)
by doing*((uintN*)dst) = *((uintN*)src)
.See discussion in #5365, #5316, #3816, #5656, in the docs committed here, plus the bit at the end of the original notes (which the docs are based on).
This PR splits up the two types of alignment checks into two sets of functions, the first for uint alignment, the second for "true" alignment.