BUG: test, fix loading structured dtypes with padding #12358

mattip · 2018-11-09T23:28:50Z

Replaces PR #10931 which it draws heavily from. Fixes #2215.

The original PR has a discussion of why this is not the best fix, it assumes a field ('', |Vn) - where n is the void size - is a padding field when someone could theoretically be using such a field for data. It also will force a possibly aligned dtype to now be not aligned, but unfortunately that information is not preserved in the storage format.

I think this is the least evil of the possibilities, although we could consider saving the padding fields with a unique name. It would have to be better than the existing choice: f1, f2, ... which in the original issue clashed with names commonly used by users

mattip · 2018-11-10T00:56:19Z

also closes #8100

numpy/lib/format.py

mattip · 2018-11-10T15:56:52Z

now will round trip fields with name == '' if the field is not a void type

numpy/lib/format.py

eric-wieser · 2018-11-10T19:11:07Z

numpy/lib/format.py

+            tup = ('%s%d' % (unique_name, itr),) + tup[1:]
+        if isinstance(tup[1], list):
+            # record of records, recurse
+            if len(tup) > 2:


Can this ever be 4, or is (name, type, shape) the longest possible tuple? What about titles?

test, fixed for titles

ahaldane · 2018-11-12T22:25:46Z

I looked at doing this a while ago, but I got sidetracked by the issue that this strategy will "lose" the aligned flag:

>>> arr = np.array([(1, 2), (3, 4)], dtype=np.dtype([('a','u2'), ('b','u4')], align=True))
>>> np.save('test', arr)
>>> np.load('test.npy').dtype
dtype({'names':['a','b'], 'formats':['<u2','<u4'], 'offsets':[0,4], 'itemsize':8})
>>> arr.dtype
dtype({'names':['a','b'], 'formats':['<u2','<u4'], 'offsets':[0,4], 'itemsize':8}, align=True)

Then I started working on trying to avoid that, by making the aligned flag be recomputed on the fly when accessed (there is no fundamental reason it needs to be stored as a flag - we can always re-compute if something is aligned). That led to snowballing problems I never finished.

FWIW my implementation of descr_without_padding (which I pasted in another PR) was very similar to yours. Here it is:

def descr_to_dtype(descr):
    if isinstance(descr, str):
        # descr was produced by dtype.str, so this always works
        return numpy.dtype(descr)

    fields = []
    offset = 0
    for field in descr:
        if len(field) == 2:
            name, descr_str = field
            dt = descr_to_dtype(descr_str)
        else:
            name, descr_str, shape = field
            dt = numpy.dtype((descr_to_dtype(descr_str), shape))

        # ignore padding bytes, which will be void bytes with '' as name
        # (once blank fieldnames are deprecated, only "if name == ''" needed)
        is_pad = (name == '' and dt.type is numpy.void and dt.names is None)
        if not is_pad:
            fields.append((name, dt, offset))

        offset += dt.itemsize

    names, formats, offsets = zip(*fields)
    return numpy.dtype({'names': names, 'formats': formats,
                        'offsets': offsets, 'itemsize': offset})

I would also recommend that we simultaneously deprecate blank fieldnames in this PR, because the code here will be unable to deal with them. I did it by adding this to _convert_from_dict around line 1169:

        if (PyObject_IsTrue(name) == 0) {
            if (DEPRECATE("Blank field names are deprecated and will result in "
                          "an error in the future.") < 0) {
                Py_DECREF(tup);
                goto fail;
            }
        }

mattip · 2018-11-12T23:38:51Z

@ahaldane your function does seem to be simpler. It does handle empty names as long as they are not a void type, but I have added the deprecation since it is confusing to have blank fields on purpose.

As for alignement=True I think we should punt on that for now, as we have never been able to round-trip the attribute so we are not breaking backward compatibility here. Calculating it on the fly would force non-aligned dtypes that happen to specify an aligned dtype using offsets to be aligned after round-tripping. I think it must be part of the storage protocol. One solution would be to use str(dtype) rather than dtype.descr but that should be a separate PR.

mattip · 2018-11-13T00:02:53Z

rebased and squashed to two commits since there were merge conflicts, so I guess that also implies restarting review

numpy/lib/tests/test_stride_tricks.py

numpy/core/src/multiarray/descriptor.c

numpy/lib/tests/test_format.py

doc/release/1.16.0-notes.rst

eric-wieser · 2018-11-13T06:26:26Z

I'd rather leave the deprecation to another PR, just so that it can actually be discussed in isolation.

numpy/lib/format.py

ahaldane · 2018-11-14T17:31:48Z

Besides style comment above, LGTM. This was less painful that I thought. Thanks @mattip!

charris · 2018-11-14T23:58:35Z

Thanks Matti.

aldanor · 2019-01-02T13:00:29Z

I'm wondering why np.load() was special-cased here? Isn't it a more general problem?

I.e., looks like #7797 is still broken? (it would need a very similar fix)

mattip · 2019-01-02T13:56:58Z

np.save does not use PEP3118 nomenclature for the dtype format.

aldanor · 2019-01-02T14:14:33Z

Ah indeed, my bad - apologies.

Fixing #7797 would require fixing/rewriting _dtype_from_pep3118() which I'm not sure anyone wants to touch.

mattip mentioned this pull request Nov 9, 2018

BUG: Fix np.load for aligned dtypes. #10931

Closed

mattip added 00 - Bug component: numpy.lib labels Nov 9, 2018

mattip added this to the 1.16.0 release milestone Nov 9, 2018

mattip mentioned this pull request Nov 9, 2018

Dtype descr inconsistency with invisible fields #3176

Closed

mattip force-pushed the roundtrip-record-arrays branch from e2cb037 to deb2396 Compare November 10, 2018 00:25

eric-wieser reviewed Nov 10, 2018

View reviewed changes

numpy/lib/format.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 10, 2018

View reviewed changes

numpy/lib/format.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 10, 2018

View reviewed changes

numpy/lib/format.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 10, 2018

View reviewed changes

numpy/lib/format.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 10, 2018

View reviewed changes

mattip force-pushed the roundtrip-record-arrays branch from 2f7fcf1 to 893dd2b Compare November 13, 2018 00:01

ahaldane reviewed Nov 13, 2018

View reviewed changes

numpy/lib/tests/test_stride_tricks.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 13, 2018

View reviewed changes

numpy/core/src/multiarray/descriptor.c Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 13, 2018

View reviewed changes

numpy/lib/tests/test_format.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 13, 2018

View reviewed changes

doc/release/1.16.0-notes.rst Outdated Show resolved Hide resolved

mattip mentioned this pull request Nov 13, 2018

DEP: deprecate empty field names #12375

Closed

mattip force-pushed the roundtrip-record-arrays branch from 893dd2b to bbe1522 Compare November 13, 2018 14:37

BUG: test, fix loading structured dtypes with padding

1956ada

mattip force-pushed the roundtrip-record-arrays branch from bbe1522 to 1956ada Compare November 13, 2018 14:38

eric-wieser reviewed Nov 13, 2018

View reviewed changes

numpy/lib/format.py Outdated Show resolved Hide resolved

eric-wieser reviewed Nov 13, 2018

View reviewed changes

numpy/lib/format.py Show resolved Hide resolved

BUG: fix for titles, cleanup, fixes from review

62e47c3

ahaldane reviewed Nov 14, 2018

View reviewed changes

numpy/lib/format.py Outdated Show resolved Hide resolved

MAINT: fix from review

a222755

charris merged commit 13a69b5 into numpy:master Nov 14, 2018

ahaldane mentioned this pull request Nov 18, 2018

BUG: np.save() and np.load() are not idempotent when align=True or fields are discontiguous #8100

Closed

embray mentioned this pull request Nov 23, 2018

Handling of invisible dtype fields in io.fits astropy/astropy#8172

Open

mattip deleted the roundtrip-record-arrays branch January 2, 2019 13:55

jzwinck mentioned this pull request Apr 30, 2019

np.load() "invalid shape in fixed-type tuple" in NumPy 1.16.0 #13431

Closed

eric-wieser mentioned this pull request Jan 23, 2020

numpy 1.16 cannot load empty array with empty descr #15396

Closed

Uh oh!

BUG: test, fix loading structured dtypes with padding #12358

BUG: test, fix loading structured dtypes with padding #12358

Uh oh!

Conversation

mattip commented Nov 9, 2018

Uh oh!

mattip commented Nov 10, 2018

Uh oh!

Uh oh!

Uh oh!

mattip commented Nov 10, 2018

Uh oh!

Uh oh!

Uh oh!

eric-wieser Nov 10, 2018

Choose a reason for hiding this comment

Uh oh!

mattip Nov 13, 2018

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Nov 12, 2018

Uh oh!

mattip commented Nov 12, 2018

Uh oh!

mattip commented Nov 13, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eric-wieser commented Nov 13, 2018

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahaldane commented Nov 14, 2018

Uh oh!

charris commented Nov 14, 2018

Uh oh!

aldanor commented Jan 2, 2019

Uh oh!

mattip commented Jan 2, 2019

Uh oh!

aldanor commented Jan 2, 2019

Uh oh!

Uh oh!