BUG: define "uint-alignment", fixes complex64 alignment #6377

ahaldane · 2015-09-27T19:53:09Z

This PR fixes alignment checks along the lines of #5365 and this comment.

This fixes the bug that complex64 (and complex128) has incorrect alignment, and should be 4 instead of 8 on most systems. This has caused further bugs such as unnecessary buffer allocation, (previously) bus errors, and more work for people adding new architectures.

The problem is that numpy really needs two different types of alignment check:

"true alignment" is the alignment required by the architecture for safe/fast access (used by ufuncs/casting/copyswap)
"uint alignment" is the alignment of an equal-sized uint, required by the strided-copy functions. which get a speedup relative to memcpy(dst, src, N) by doing *((uintN*)dst) = *((uintN*)src).

See discussion in #5365, #5316, #3816, #5656, in the docs committed here, plus the bit at the end of the original notes (which the docs are based on).

This PR splits up the two types of alignment checks into two sets of functions, the first for uint alignment, the second for "true" alignment.

charris · 2015-09-27T20:27:32Z

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

@@ -1382,7 +1372,7 @@ mapiter_trivial_@name@(PyArrayObject *self, PyArrayObject *ind,

    npy_intp itersize;

-    int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result);
+    int is_aligned = IsUintAligned(self) && IsUintAligned(result);


IsUintAligned needs a declaration.

charris · 2015-09-27T20:30:13Z

Looks like array_assign.h needs to be included in a few files. @juliantaylor Could you take a look?

ahaldane · 2015-09-27T20:33:33Z

Right now I'm looking into the python3 assertion failures which seem more serious.

ahaldane · 2015-09-27T21:52:22Z

I'll have to finish later. But this is what fails:

>>> np.copyto(np.array([0]), np.array([1]), where=np.array([False], dtype=bool))

My current guess is that there is a bug in _strided_masked_wrapper_transfer_function: It first looks for the next non-masked value using npy_memchr, but there is none. Then subloopsize is 1, and dst_stride is -1, which causes dst to point to some huge invalid address. Then it attemps to copy values to that address, which trips the assertion in _aligned_strided_to_strided_size8_srcstride0.

I think this only turned up now because I added RELAXED_STRIDE_CHECKING to raw_array_is_aligned.

juliantaylor · 2015-09-27T22:28:15Z

is there anything different to my branch besides adding raw_*
would be easier to review without all the unnecessary variable renames

ahaldane · 2015-09-27T23:25:21Z

@juliantaylor I renamed all the uintaligned for now, but I'd like to keep the change in the final version, since I think it will be confusing that 'aligned' does not actually mean aligned.

ahaldane · 2015-09-28T00:46:34Z

OK, I think I fixed the bug in _strided_masked_wrapper_transfer_function (see last commit).

@juliantaylor It's largely the same, but I reorganizes the IsAligned and raw_array_is_aligned functions to avoid duplication, removed the special case for flexible arrays, and implemented the "uint aligned" case differently.

The last 4 (unnumbered) commits are additional bugfixes/cleanups.

ahaldane · 2015-09-28T23:18:31Z

I added a few more changes I missed before (last commit). Nditer code needed "uint" alignment too.

Also, I got rid of NPY_MAX_COPY_ALIGNMENT, since I'm pretty sure it isn't needed any more, since the "uint" alignment checks make everything safe.

pitrou · 2015-09-29T17:36:46Z

doc/source/dev/alignment.rst

+   determines offsets automatically. In that case, ``align=True`` pads the
+   structure so that each field is aligned in memory, sets ``dtype.alignment``
+   to be the largest of the field alignments, and sets ``dtype.itemsize`` to
+   the smallest posible multiple of this alignment. This is what C-structs


(it also sets dtype.isalignedstruct)

homu · 2016-01-28T16:06:51Z

☔ The latest upstream changes (presumably #7134) made this pull request unmergeable. Please resolve the merge conflicts.

homu · 2016-03-29T13:32:38Z

☔ The latest upstream changes (presumably #7481) made this pull request unmergeable. Please resolve the merge conflicts.

ahaldane · 2017-06-28T22:49:01Z

Rebased and simplified.

In summary:

Fixes a few bugs in preparation for the other commits (first 2 commits, see comment)
Implements functions to differentiate "uint alignment" and "true alignment" (see doc commit). The "uint alignment" computation is quite different (I think more literal) than in Julian's branch.
Switch over all copy/transfer code paths to use the "uint alignment". This involved a lot of grepping/reading to catch all the uses of _IsAligned, PyArray_ISALIGNED, raw_array_is_aligned in many numpy files.

I've removed the massive variable renaming I used to have of "aligned" -> "uintaligned" in the copy code paths. We just need to remember that in those files "aligned" refers to uint alignment, not true alignment.

charris · 2018-09-26T14:20:31Z

Does this affect pickles, npz files, etc.? I'm mostly concerned about data stored with the old alignment becoming invalid. Don't know if that is possible or not, I'm guessing that the stored offsets will take care of that and that if it were a problem cross platform compatibility would already be broken.

ahaldane · 2018-09-26T15:07:24Z

That's an important thought! But I don't think it's a problem:

I don't think it affects npz file loading because those files store the dtype using the pep3118 data-syntax (dt.descr), which explicitly includes any padding bytes. So for instance, np.dtype('c8,u1', align=True).descr is [('f0', '<c8'), ('f1', '|u1'), ('', '|V7')] in current master, which explicitly includes 7 trailing padding bytes. (After this PR it will become [('f0', '<c8'), ('f1', '|u1'), ('', '|V3')]). When you np.load either of those, before and after this PR, you will get the padding bytes used at np.save time.

As a sidenote, np.load currently horribly screws up any such datatype with padding bytes anyway, see my old unfinished PRs #7798 and those related to #8100. So it is unlikely anyone has stored aligned structs in npz files. That motivates getting this alignment bug fixed before fixing #8100, since people may start storing padded structs after #8100 is fixed making it harder to change alignment bugs.

charris · 2018-09-26T17:16:25Z

numpy/core/src/common/array_assign.c

@@ -84,14 +84,27 @@ broadcast_error: {

 /* See array_assign.h for parameter documentation */
 NPY_NO_EXPORT int
-raw_array_is_aligned(int ndim, char *data, npy_intp *strides, int alignment)
+raw_array_is_aligned(int ndim, npy_intp *shape,
+                     char *data, npy_intp *strides, int alignment)
 {
    if (alignment > 1) {
        npy_intp align_check = (npy_intp)data;


Maybe npy_uintp. I'm pretty sure both work here for twos complement integers, but I needed to think about it.

Note that it is later cast to npy_uintp in npy_is_aligned.

Only bitwise operations are involved, so it should be exactly the same.

charris · 2018-09-26T17:46:41Z

numpy/core/src/common/array_assign.c

+            /* skip dim == 1 as it is not required to have stride 0 */
+            if (shape[i] > 1) {
+                /* if shape[i] == 1, the stride is never used */
+                align_check |= strides[i];


I think there is an assumption of multiples of powers of two, together with twos complement arithmetic, built in here. AFAICT, there are no false positive, but perhaps some false negatives. @juliantaylor @seberg Thoughts?

I don't see any arithmetic here, only bitwise OR operations. Whether the integer is considered signed or unsigned doesn't come into play.

Well the bitwise or is used together with arrhythmic stuff later since it is used in npy_is_aligned. Did not think about it long, but I think you are right Chuck. I guess npy_is_aligned might explain it a bit more?

Not sure if twos complement has anything to do with it though.

"arrythmic"? :-)

More seriously, I don't know what you are bothering about. First data is cast to npy_intp(this is an exact bitcast), some bitwise operations are done on the signed integer (they are agnostic wrt signedness, they just operate on individual bits), then the result is cast back to void* (this is an exact bitcast), then it is finally cast to npy_uintp (again an exact bitcast).

There is zero reason to believe that the intermediate cast to npy_intp is any potential cause for trouble.

Well, if this errs too much on the safe side (no idea if it does). I guess placing the npy_is_aligned check into the loop here instead should work and probably not be a lot of overhead. (Plus this is the flags calculation, right? so it doesn't happen very often anyway). Or am I missing something? Plus it might be a bit less mind boggling.

EDIT: Frankly, nvm, I bet for all practical purposes, this just always works.

Yeah, it has been there a long time. I think it basically gives the largest power of two that divides the GCD. For non-powers of two I'm not sure how well it works. But we have lived with it for a long time. Might be worth a note, however.

Some googling suggests that the C standard says that alignments must be powers of two. (I wasn't able to find the exact quote though). In that case, the special casing in npy_is_aligned is unnecessary, and we can just assume it is a power of two.

I'll add a note in any case, I think this is the 3rd time now that I've puzzled out what this code does.

I was planning to sleep on it :) I think when all is done, it will be OK for its intended application, but I want to understand it...

I think you're right that this assumes a twos-complement representation of signed integers, even though the C standard explicitly allows others like ones-complement. I'm not sure we care, because in practice all modern computers use two's complement, and because all this pointer bit-twiddling is C-implementation-dependent anyway.

Here's my understanding: We are trying to compute whether (data + n*stride)%alignment == 0 for all integers n. First, assuming alignment is a power of two, and assuming twos-complement representation, simplify the %alignment == 0 to checking if the lowest log2(alignment) bits are 0. Next, use the fact that in twos complement representation, if x low order bits of data and stride are 0, then the x lower bits of data + n*stride must also be 0 for all n because of how binary addition works. Neither simplification is valid for negative values in ones-complement representation.

Casting the intp stride to uintp` and using uintp would avoid the signed twos-complement assumption, and the algorithm still works.

However, the representation/cast of data from pointer to uint is implementation defined too, so we are violating the standard anyway anytime we use bitwise ops on pointer values. (only arithmetic is allowed, I think). I don't see any alternative though, as far as I see there is no totally standard-compliant way to test alignment.

charris · 2018-09-26T17:56:33Z

numpy/core/src/multiarray/lowlevel_strided_loops.c.src

@@ -1385,7 +1375,7 @@ mapiter_trivial_@name@(PyArrayObject *self, PyArrayObject *ind,

    npy_intp itersize;

-    int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result);


How do PyArray_ISALIGNED and IsUintAligned differ?

The former checks "true" alignment, the latter "uint" alignment.

We need uint alignment for the copy operations about 100 lines down.

charris · 2018-09-26T18:00:06Z

doc/source/dev/alignment.rst

+datatype is implemented as ``struct { float real, imag; }``. This has "true"
+alignment of 4 and "uint" alignment of 8 (equal to the true alignment of
+``uint64``).
+


So there are some structures that have no uint alignment?

Yes, that's one way to see it. Or, in pratice, those structs get assigned a uint alignment of 0.

The "uint" alignments are for use in the strided copy code, which only special-cases the 1,2,4,8, and 16 byte sizes. All other cases return a "uint" alignment of 0, which triggers memmove to be used instead.

I'll correct/update the docs to better describe what happens for non-power-of-two sized types.

eric-wieser · 2018-09-27T05:22:45Z

doc/source/dev/alignment.rst

+"Uint" alignment depends on the size of a datatype. It is defined to be the
+"True alignment" of the uint of equivalent size to the datatype of interest,
+or undefined/unaligned if the size is not a power of two. If the itemsize of an
+array is not a power of two the array can never be uint aligned.


What if it's a power of two greater than sizeof(uintmax_t)?

I'll have to rephrase, that's an inaccurate sentence. It also doesn't describe the fact that 16 bytes types have a uint alignment of 8, because that's what the copy code wants. Maybe:

"Uint" alignment depends on the size of a datatype. It is defined to be the "True alignment" of the uint used in numpy's copy-code to copy the datatype, or undefined/unaligned if there is no equivalent uint. Currently numpy uses uint8, uint16, uint32, uint64 and uint64 to copy data of size 1,2,4,8,16 bytes respectively, and all other sized datatypes cannot be uint aligned.

eric-wieser · 2018-09-27T05:23:44Z

doc/source/dev/alignment.rst

+
+ * The ``dtype.alignment`` attribute (``descr->alignment`` in C). This is meant
+   to reflect the "true alignment" of the type. It has arch-dependent default
+   values for non-flexible types, is equal to 1 for flexible types (including


Counter-example:

>>> np.dtype('U1').alignment 4 >>> np.issubdtype('U1', np.flexible) True

eric-wieser · 2018-09-27T05:24:56Z

doc/source/dev/alignment.rst

+ * The ``align`` keyword of the dtype constructor, which only affects structured
+   arrays. If the structure's field offsets are not manually provided numpy
+   determines offsets automatically. In that case, ``align=True`` pads the
+   structure so that each field is aligned in memory and sets


Don't you need to specify which meaning of "align" you are using here?

numpy/core/src/common/array_assign.h

Implements IsAligned, IsUintAligned, npy_uint_alignment

charris · 2018-09-30T17:52:26Z

Thanks Allan.

charris reviewed Sep 27, 2015
View reviewed changes

ahaldane force-pushed the fix_align branch from ce3dbd3 to dc1add4 Compare September 27, 2015 23:23

ahaldane force-pushed the fix_align branch 2 times, most recently from 87186c2 to 3284052 Compare September 28, 2015 00:33

ahaldane force-pushed the fix_align branch 2 times, most recently from 3c4661f to 7869879 Compare September 28, 2015 16:15

charris added component: numpy._core 01 - Enhancement labels Sep 28, 2015

ahaldane force-pushed the fix_align branch from f10ce6d to d4f4293 Compare September 29, 2015 00:07

pitrou reviewed Sep 29, 2015
View reviewed changes

ahaldane force-pushed the fix_align branch from 5b33730 to d5aaee1 Compare February 14, 2016 04:10

ahaldane force-pushed the fix_align branch 4 times, most recently from 7500aa9 to 475e4ef Compare June 28, 2017 22:18

ahaldane changed the title ~~fix alignment issues with complex64~~ BUG: define "uint-alignment", fixes complex64 alignment Jun 28, 2017

ahaldane force-pushed the fix_align branch 2 times, most recently from 7e69574 to 48c389c Compare June 29, 2017 02:51

ahaldane mentioned this pull request Jan 20, 2018

BUG: Fix various Big-Endian test failures (ppc64) #10443

Merged

charris mentioned this pull request Feb 9, 2018

BUG: Fix various Big-Endian test failures (ppc64) #10561

Merged

charris reviewed Sep 26, 2018

View reviewed changes

ahaldane force-pushed the fix_align branch from 58a71ef to bc02552 Compare September 27, 2018 01:06

eric-wieser reviewed Sep 27, 2018

View reviewed changes

numpy/core/src/common/array_assign.h Show resolved Hide resolved

ahaldane force-pushed the fix_align branch from bc02552 to eaf1876 Compare September 27, 2018 16:42

BUG: _strided_masked_wrapper_transfer_function goes out of bounds

351e4b6

ahaldane force-pushed the fix_align branch 3 times, most recently from 80c5960 to 8ee2c91 Compare September 27, 2018 17:32

ahaldane added 7 commits September 27, 2018 15:43

BUG: raw_array_is_aligned ignores NPY_RELAXED_STRIDES_CHECKING

b76c0df

ENH: Implement methods for uint-alignment

27d4ce9

Implements IsAligned, IsUintAligned, npy_uint_alignment

ENH: Make copy-code-paths check for uint-alignment

1252b80

ENH: Fix complex64 alignment

8097aa3

TST: Test complex64 alignment

33e1a7d

DOC: Document how memory alignment works as of 1.14

38af6dd

MAINT: remove unneeded test in npy_is_aligned

12bd7c3

ahaldane force-pushed the fix_align branch from 8ee2c91 to 12bd7c3 Compare September 27, 2018 19:44

charris merged commit 4a7926f into numpy:master Sep 30, 2018

QuLogic mentioned this pull request Oct 24, 2018

TestScalarPEP3118.test_scalar_match_array fails on armv7hl #11832

Closed

charris mentioned this pull request Dec 8, 2018

ndarray.fill crashes in master #12503

Closed

ahaldane mentioned this pull request Dec 25, 2018

1.6.0rc1 3.7 debug: test failure: test_multiarray.py::TestAlignment::test_various_alignments Aborted #12607

Closed

ahaldane mentioned this pull request Apr 22, 2019

Bug in function add.at (core dump) #13317

Open

		@@ -1385,7 +1375,7 @@ mapiter_trivial_@name@(PyArrayObject self, PyArrayObject ind,

		npy_intp itersize;

		int is_aligned = PyArray_ISALIGNED(self) && PyArray_ISALIGNED(result);

BUG: define "uint-alignment", fixes complex64 alignment #6377

BUG: define "uint-alignment", fixes complex64 alignment #6377

Conversation

ahaldane commented Sep 27, 2015 • edited Loading

Choose a reason for hiding this comment

charris commented Sep 27, 2015

ahaldane commented Sep 27, 2015

ahaldane commented Sep 27, 2015

juliantaylor commented Sep 27, 2015

ahaldane commented Sep 27, 2015

ahaldane commented Sep 28, 2015

ahaldane commented Sep 28, 2015

Choose a reason for hiding this comment

homu commented Jan 28, 2016

homu commented Mar 29, 2016

ahaldane commented Jun 28, 2017

charris commented Sep 26, 2018

ahaldane commented Sep 26, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitrou Sep 26, 2018 • edited Loading

Choose a reason for hiding this comment

seberg Sep 26, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser Sep 27, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Sep 30, 2018

ahaldane commented Sep 27, 2015 •

edited

Loading

pitrou Sep 26, 2018 •

edited

Loading

seberg Sep 26, 2018 •

edited

Loading

eric-wieser Sep 27, 2018 •

edited

Loading