Skip to content

BUG: define "uint-alignment", fixes complex64 alignment #6377

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Sep 30, 2018

Conversation

ahaldane
Copy link
Member

@ahaldane ahaldane commented Sep 27, 2015

This PR fixes alignment checks along the lines of #5365 and this comment.

This fixes the bug that complex64 (and complex128) has incorrect alignment, and should be 4 instead of 8 on most systems. This has caused further bugs such as unnecessary buffer allocation, (previously) bus errors, and more work for people adding new architectures.

The problem is that numpy really needs two different types of alignment check:

  • "true alignment" is the alignment required by the architecture for safe/fast access (used by ufuncs/casting/copyswap)
  • "uint alignment" is the alignment of an equal-sized uint, required by the strided-copy functions. which get a speedup relative to memcpy(dst, src, N) by doing *((uintN*)dst) = *((uintN*)src).

See discussion in #5365, #5316, #3816, #5656, in the docs committed here, plus the bit at the end of the original notes (which the docs are based on).

This PR splits up the two types of alignment checks into two sets of functions, the first for uint alignment, the second for "true" alignment.

@@ -1382,7 +1372,7 @@ mapiter_trivial_@name@(PyArrayObject *self, PyArrayObject *ind,

npy_intp itersize;

int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result);
int is_aligned = IsUintAligned(self) && IsUintAligned(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsUintAligned needs a declaration.

@charris
Copy link
Member

charris commented Sep 27, 2015

Looks like array_assign.h needs to be included in a few files. @juliantaylor Could you take a look?

@ahaldane
Copy link
Member Author

Right now I'm looking into the python3 assertion failures which seem more serious.

@ahaldane
Copy link
Member Author

I'll have to finish later. But this is what fails:

>>> np.copyto(np.array([0]), np.array([1]), where=np.array([False], dtype=bool))

My current guess is that there is a bug in _strided_masked_wrapper_transfer_function: It first looks for the next non-masked value using npy_memchr, but there is none. Then subloopsize is 1, and dst_stride is -1, which causes dst to point to some huge invalid address. Then it attemps to copy values to that address, which trips the assertion in _aligned_strided_to_strided_size8_srcstride0.

I think this only turned up now because I added RELAXED_STRIDE_CHECKING to raw_array_is_aligned.

@juliantaylor
Copy link
Contributor

is there anything different to my branch besides adding raw_*
would be easier to review without all the unnecessary variable renames

@ahaldane
Copy link
Member Author

@juliantaylor I renamed all the uintaligned for now, but I'd like to keep the change in the final version, since I think it will be confusing that 'aligned' does not actually mean aligned.

@ahaldane ahaldane force-pushed the fix_align branch 2 times, most recently from 87186c2 to 3284052 Compare September 28, 2015 00:33
@ahaldane
Copy link
Member Author

OK, I think I fixed the bug in _strided_masked_wrapper_transfer_function (see last commit).

@juliantaylor It's largely the same, but I reorganizes the IsAligned and raw_array_is_aligned functions to avoid duplication, removed the special case for flexible arrays, and implemented the "uint aligned" case differently.

The last 4 (unnumbered) commits are additional bugfixes/cleanups.

@ahaldane
Copy link
Member Author

I added a few more changes I missed before (last commit). Nditer code needed "uint" alignment too.

Also, I got rid of NPY_MAX_COPY_ALIGNMENT, since I'm pretty sure it isn't needed any more, since the "uint" alignment checks make everything safe.

determines offsets automatically. In that case, ``align=True`` pads the
structure so that each field is aligned in memory, sets ``dtype.alignment``
to be the largest of the field alignments, and sets ``dtype.itemsize`` to
the smallest posible multiple of this alignment. This is what C-structs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(it also sets dtype.isalignedstruct)

@homu
Copy link
Contributor

homu commented Jan 28, 2016

☔ The latest upstream changes (presumably #7134) made this pull request unmergeable. Please resolve the merge conflicts.

@homu
Copy link
Contributor

homu commented Mar 29, 2016

☔ The latest upstream changes (presumably #7481) made this pull request unmergeable. Please resolve the merge conflicts.

@ahaldane ahaldane force-pushed the fix_align branch 4 times, most recently from 7500aa9 to 475e4ef Compare June 28, 2017 22:18
@ahaldane
Copy link
Member Author

Rebased and simplified.

In summary:

  • Fixes a few bugs in preparation for the other commits (first 2 commits, see comment)
  • Implements functions to differentiate "uint alignment" and "true alignment" (see doc commit). The "uint alignment" computation is quite different (I think more literal) than in Julian's branch.
  • Switch over all copy/transfer code paths to use the "uint alignment". This involved a lot of grepping/reading to catch all the uses of _IsAligned, PyArray_ISALIGNED, raw_array_is_aligned in many numpy files.

I've removed the massive variable renaming I used to have of "aligned" -> "uintaligned" in the copy code paths. We just need to remember that in those files "aligned" refers to uint alignment, not true alignment.

@ahaldane ahaldane changed the title fix alignment issues with complex64 BUG: define "uint-alignment", fixes complex64 alignment Jun 28, 2017
@ahaldane ahaldane force-pushed the fix_align branch 2 times, most recently from 7e69574 to 48c389c Compare June 29, 2017 02:51
@charris
Copy link
Member

charris commented Sep 26, 2018

Does this affect pickles, npz files, etc.? I'm mostly concerned about data stored with the old alignment becoming invalid. Don't know if that is possible or not, I'm guessing that the stored offsets will take care of that and that if it were a problem cross platform compatibility would already be broken.

@ahaldane
Copy link
Member Author

That's an important thought! But I don't think it's a problem:

I don't think it affects npz file loading because those files store the dtype using the pep3118 data-syntax (dt.descr), which explicitly includes any padding bytes. So for instance, np.dtype('c8,u1', align=True).descr is [('f0', '<c8'), ('f1', '|u1'), ('', '|V7')] in current master, which explicitly includes 7 trailing padding bytes. (After this PR it will become [('f0', '<c8'), ('f1', '|u1'), ('', '|V3')]). When you np.load either of those, before and after this PR, you will get the padding bytes used at np.save time.

As a sidenote, np.load currently horribly screws up any such datatype with padding bytes anyway, see my old unfinished PRs #7798 and those related to #8100. So it is unlikely anyone has stored aligned structs in npz files. That motivates getting this alignment bug fixed before fixing #8100, since people may start storing padded structs after #8100 is fixed making it harder to change alignment bugs.

@@ -84,14 +84,27 @@ broadcast_error: {

/* See array_assign.h for parameter documentation */
NPY_NO_EXPORT int
raw_array_is_aligned(int ndim, char *data, npy_intp *strides, int alignment)
raw_array_is_aligned(int ndim, npy_intp *shape,
char *data, npy_intp *strides, int alignment)
{
if (alignment > 1) {
npy_intp align_check = (npy_intp)data;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe npy_uintp. I'm pretty sure both work here for twos complement integers, but I needed to think about it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that it is later cast to npy_uintp in npy_is_aligned.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only bitwise operations are involved, so it should be exactly the same.

/* skip dim == 1 as it is not required to have stride 0 */
if (shape[i] > 1) {
/* if shape[i] == 1, the stride is never used */
align_check |= strides[i];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an assumption of multiples of powers of two, together with twos complement arithmetic, built in here. AFAICT, there are no false positive, but perhaps some false negatives. @juliantaylor @seberg Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any arithmetic here, only bitwise OR operations. Whether the integer is considered signed or unsigned doesn't come into play.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well the bitwise or is used together with arrhythmic stuff later since it is used in npy_is_aligned. Did not think about it long, but I think you are right Chuck. I guess npy_is_aligned might explain it a bit more?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if twos complement has anything to do with it though.

Copy link
Member

@pitrou pitrou Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"arrythmic"? :-)

More seriously, I don't know what you are bothering about. First data is cast to npy_intp(this is an exact bitcast), some bitwise operations are done on the signed integer (they are agnostic wrt signedness, they just operate on individual bits), then the result is cast back to void* (this is an exact bitcast), then it is finally cast to npy_uintp (again an exact bitcast).

There is zero reason to believe that the intermediate cast to npy_intp is any potential cause for trouble.

Copy link
Member

@seberg seberg Sep 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, if this errs too much on the safe side (no idea if it does). I guess placing the npy_is_aligned check into the loop here instead should work and probably not be a lot of overhead. (Plus this is the flags calculation, right? so it doesn't happen very often anyway). Or am I missing something? Plus it might be a bit less mind boggling.

EDIT: Frankly, nvm, I bet for all practical purposes, this just always works.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it has been there a long time. I think it basically gives the largest power of two that divides the GCD. For non-powers of two I'm not sure how well it works. But we have lived with it for a long time. Might be worth a note, however.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some googling suggests that the C standard says that alignments must be powers of two. (I wasn't able to find the exact quote though). In that case, the special casing in npy_is_aligned is unnecessary, and we can just assume it is a power of two.

I'll add a note in any case, I think this is the 3rd time now that I've puzzled out what this code does.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was planning to sleep on it :) I think when all is done, it will be OK for its intended application, but I want to understand it...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you're right that this assumes a twos-complement representation of signed integers, even though the C standard explicitly allows others like ones-complement. I'm not sure we care, because in practice all modern computers use two's complement, and because all this pointer bit-twiddling is C-implementation-dependent anyway.

Here's my understanding: We are trying to compute whether (data + n*stride)%alignment == 0 for all integers n. First, assuming alignment is a power of two, and assuming twos-complement representation, simplify the %alignment == 0 to checking if the lowest log2(alignment) bits are 0. Next, use the fact that in twos complement representation, if x low order bits of data and stride are 0, then the x lower bits of data + n*stride must also be 0 for all n because of how binary addition works. Neither simplification is valid for negative values in ones-complement representation.

Casting the intp stride to uintp` and using uintp would avoid the signed twos-complement assumption, and the algorithm still works.

However, the representation/cast of data from pointer to uint is implementation defined too, so we are violating the standard anyway anytime we use bitwise ops on pointer values. (only arithmetic is allowed, I think). I don't see any alternative though, as far as I see there is no totally standard-compliant way to test alignment.

@@ -1385,7 +1375,7 @@ mapiter_trivial_@name@(PyArrayObject *self, PyArrayObject *ind,

npy_intp itersize;

int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do PyArray_ISALIGNED and IsUintAligned differ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former checks "true" alignment, the latter "uint" alignment.

We need uint alignment for the copy operations about 100 lines down.

datatype is implemented as ``struct { float real, imag; }``. This has "true"
alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
``uint64``).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So there are some structures that have no uint alignment?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's one way to see it. Or, in pratice, those structs get assigned a uint alignment of 0.

The "uint" alignments are for use in the strided copy code, which only special-cases the 1,2,4,8, and 16 byte sizes. All other cases return a "uint" alignment of 0, which triggers memmove to be used instead.

I'll correct/update the docs to better describe what happens for non-power-of-two sized types.

"Uint" alignment depends on the size of a datatype. It is defined to be the
"True alignment" of the uint of equivalent size to the datatype of interest,
or undefined/unaligned if the size is not a power of two. If the itemsize of an
array is not a power of two the array can never be uint aligned.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it's a power of two greater than sizeof(uintmax_t)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll have to rephrase, that's an inaccurate sentence. It also doesn't describe the fact that 16 bytes types have a uint alignment of 8, because that's what the copy code wants. Maybe:

"Uint" alignment depends on the size of a datatype. It is defined to be the
"True alignment" of the uint used in numpy's copy-code to copy the datatype,
or undefined/unaligned if there is no equivalent uint. Currently numpy uses 
uint8, uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes 
respectively, and all other sized datatypes cannot be uint aligned.


* The ``dtype.alignment`` attribute (``descr->alignment`` in C). This is meant
to reflect the "true alignment" of the type. It has arch-dependent default
values for non-flexible types, is equal to 1 for flexible types (including
Copy link
Member

@eric-wieser eric-wieser Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter-example:

>>> np.dtype('U1').alignment
4
>>> np.issubdtype('U1', np.flexible)
True

* The ``align`` keyword of the dtype constructor, which only affects structured
arrays. If the structure's field offsets are not manually provided numpy
determines offsets automatically. In that case, ``align=True`` pads the
structure so that each field is aligned in memory and sets
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't you need to specify which meaning of "align" you are using here?

@ahaldane ahaldane force-pushed the fix_align branch 3 times, most recently from 80c5960 to 8ee2c91 Compare September 27, 2018 17:32
@charris charris merged commit 4a7926f into numpy:master Sep 30, 2018
@charris
Copy link
Member

charris commented Sep 30, 2018

Thanks Allan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants