BUG: Fix `np.load` for aligned dtypes. #10931

ihormelnyk · 2018-04-19T14:49:08Z

import numpy as np

t = np.dtype('i1, i4, i1', align=True)
d = np.zeros(1, t)

np.save("test.npy", d)
data = np.load("test.npy")

Traceback (most recent call last):
  File "D:\Projects\numpy_bug.py", line 8, in <module>
    data = np.load("test.npy")
  File "D:\Projects\vendors\numpy\lib\npyio.py", line 314, in load
    return format.read_array(fid)
  File "D:\Projects\vendors\numpy\lib\format.py", line 440, in read_array
    shape, fortran_order, dtype = read_array_header_1_0(fp)
  File "D:\Projects\vendors\numpy\lib\format.py", line 358, in read_array_header_1_0
    dtype = numpy.dtype(d['descr'])
ValueError: two fields with the same name

Fixes gh-2215

ahaldane · 2018-04-19T20:56:01Z

numpy/lib/format.py

+        old_header = header
+        # replace paddings like ('', '|V4')
+        header = re.sub(r"\s*,*\s*\(\s*''\s*,\s*'\|*V\d'\s*\)", '', header)
+        aligned = header != old_header


Hmm, this test assumes that the pressence of padding-fields implies an aligned dtype. but actually there are many ways to create a dtype which has padding fields yet which is not "aligned".

For instance, try saving and loading the array

>>> a = np.zeros(3, dtype={'names': ['a', 'b'], 'formats': ['i4', 'i4'], 'offsets': [1, 7]}) >>> np.save('test', a) >>> np.load('test.npy')

Also, I think we should avoid regular expression processing if possible, I am worried it is too fragile. It would be better to turn the string into a list of tuples first using save_eval, and then do some computations on the list of tuples to get the right offsets, names, dtypes.

Is it possible to save npy where fields in dtype have empty names?
It will be more complex code to rebuild list of tuples

The code already there rebuilds the list of tuples right now, using safe_eval: You can access it in d['descr'], as is done a few lines below in the file.

What I am suggesting is adding some code like:

def descr_to_dtype(descr): if isisinstance(descr, str): return np.dtype(descr) # Question: Is this guaranteed to work? fields = [] offset = 0 for name, descr_i in d['descr']: dt = process_descr(descr_i) if name == '': # skip padding assert(dt.type is np.void and dt.names is None) else: fields.append((name, dt, offset)) offset += dt.itemsize names, formats, offsets = zip(*fields) return np.dtype('names': names, 'formats': formats, 'offsets': offsets, 'itemsize': offset) dtype = descr_to_dtype(d['descr'])

(completely untested, just to give an idea)

Also, it technically is possible to create a dtype with empty field, but only using the dict specification:

np.dtype({'names': [''], 'formats': 'i4'})

(see this comment)

I think that if we want to fix up the save/load code here, we may want to disallow that possibility, to avoid confusion when such an array can't be saved (after this PR).

ahaldane · 2018-04-19T21:02:46Z

Also, you may not have seen it, but there is an ongoing discussion starting here (link) about exactly how to fix this bug.

In this PR you are going with what I called option 2 in a comment there. I think that is probably the easiest, and possibly best, thing to do. That is, we should continue storing the dtype using the .descr representation of the dtype, which is a list of tuples, but we need to modify how this list is read back in to remove the padding names and compute the right offsets, like I suggested in my inline comment to your code.

@mhvk also suggested that we "pack" the fields before saving to file, but that is an extra detail that we can do in a separate PR, if desired.

mattip · 2018-07-07T16:14:33Z

@ihormelnyk do you think you will continue with this PR? It seems we need to fix this in order to move on with some other cleanups

ihormelnyk · 2018-07-07T16:25:46Z

Hi @mattip ,
What do you mean by "continue"?

mattip · 2018-07-07T16:31:53Z

there were suggestions for fixes to this PR that you have not yet responded to

ihormelnyk · 2018-07-07T18:04:56Z

I think, I'll be able to look into it on Monday

ihormelnyk · 2018-07-09T08:55:53Z

Hi @mattip , @ahaldane
Sorry, but I have no enoght time to be deeply involved in numpy development and discussion.
Here is a patch which we have been using since 2010 till now to load numpy with aligned dtype:

 numpy/lib/format.py | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

diff --git a/numpy/lib/format.py b/numpy/lib/format.py
index ef5ec57e3..1e4915d11 100644
--- a/numpy/lib/format.py
+++ b/numpy/lib/format.py
@@ -343,6 +343,27 @@ def _write_array_header(fp, d, version=None):
     fp.write(header)
     return version
 
+
+def get_descr_without_paddings(descr):
+    aligned = False
+    if not isinstance(descr, list):
+        return descr, aligned
+    newdescr = []
+    for i in xrange(len(descr)):
+        if descr[i][0]=='' and (descr[i][1].startswith('|V') or descr[i][1].startswith('V')) and isinstance(descr[i][1], str):
+            aligned = True #padding
+        else:
+            if isinstance(descr[i][1], list):
+                d, a = get_descr_without_paddings(descr[i][1])
+                aligned =  aligned or a
+                l = list(descr[i])
+                l[1] = d
+                newdescr.append(tuple(l))
+            else:
+                newdescr.append(descr[i])
+    return newdescr, aligned
+
+
 def write_array_header_1_0(fp, d):
     """ Write the header for an array using the 1.0 format.
 
@@ -524,7 +545,8 @@ def _read_array_header(fp, version):
         msg = "fortran_order is not a valid bool: %r"
         raise ValueError(msg % (d['fortran_order'],))
     try:
-        dtype = numpy.dtype(d['descr'])
+        descr, aligned = get_descr_without_paddings(d['descr'])
+        dtype = numpy.dtype(descr, align=aligned)
     except TypeError as e:
         msg = "descr is not a valid dtype descriptor: %r"
         raise ValueError(msg % (d['descr'],))

Aligned dtype is very important for our computation framework, due to we use it to pass large data structs to python C extension (to avoid time consuming copying) and further calculation on the GPU.
Also there is code generation phase, which perepare python and c++ interfaces for computation models, so numpy structs have to be C compatible.
Looks like this is last fix related to aligned dtype after which we will be able to use numpy for our clients without patching.

P.S. But again we can't upgrade version of numpy for clients higher than 1.13.3 due to "Multiple-field indexing/assignment of structured arrays" changes in 1.14.
#6053 (comment)

mattip · 2018-11-09T23:31:37Z

@ihormelnyk I opened a new PR #12358 to replace this one. It is not quite this fix since it does not assume the dtype should be aligned, that information is simply lost in the storage format. I would like to close this one in favor of continuing the conversation there.

mattip · 2018-11-13T00:14:49Z

Closing in favor of #12358

Fixes numpy#2215 Loading numpy with aligned dtype

36fc78c

ahaldane reviewed Apr 19, 2018

View reviewed changes

charris changed the title ~~Fixes #2215 Loading numpy with aligned dtype~~ BUG: Fix Loading numpy with aligned dtype May 5, 2018

charris changed the title ~~BUG: Fix Loading numpy with aligned dtype~~ BUG: Fix np.load for aligned dtypes. May 5, 2018

charris added 00 - Bug component: numpy.lib labels May 5, 2018

ahaldane mentioned this pull request Jul 7, 2018

.view no longer works on field selections of structured arrays #10409

Closed

ahaldane mentioned this pull request Aug 17, 2018

BUG: np.save() and np.load() are not idempotent when align=True or fields are discontiguous #8100

Closed

mattip mentioned this pull request Nov 9, 2018

BUG: test, fix loading structured dtypes with padding #12358

Merged

mattip closed this Nov 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: Fix `np.load` for aligned dtypes. #10931

BUG: Fix `np.load` for aligned dtypes. #10931

Uh oh!

ihormelnyk commented Apr 19, 2018 •

edited by eric-wieser

Loading

Uh oh!

ahaldane Apr 19, 2018 •

edited

Loading

Uh oh!

ihormelnyk Apr 19, 2018

Uh oh!

ahaldane Apr 19, 2018 •

edited

Loading

Uh oh!

ahaldane Apr 19, 2018

Uh oh!

ahaldane commented Apr 19, 2018

Uh oh!

mattip commented Jul 7, 2018

Uh oh!

ihormelnyk commented Jul 7, 2018

Uh oh!

mattip commented Jul 7, 2018

Uh oh!

ihormelnyk commented Jul 7, 2018

Uh oh!

ihormelnyk commented Jul 9, 2018

Uh oh!

mattip commented Nov 9, 2018

Uh oh!

mattip commented Nov 13, 2018

Uh oh!

Uh oh!

Uh oh!

BUG: Fix np.load for aligned dtypes. #10931

BUG: Fix np.load for aligned dtypes. #10931

Uh oh!

Conversation

ihormelnyk commented Apr 19, 2018 • edited by eric-wieser Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahaldane Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ihormelnyk Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

ahaldane Apr 19, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahaldane Apr 19, 2018

Choose a reason for hiding this comment

Uh oh!

ahaldane commented Apr 19, 2018

Uh oh!

mattip commented Jul 7, 2018

Uh oh!

ihormelnyk commented Jul 7, 2018

Uh oh!

mattip commented Jul 7, 2018

Uh oh!

ihormelnyk commented Jul 7, 2018

Uh oh!

ihormelnyk commented Jul 9, 2018

Uh oh!

mattip commented Nov 9, 2018

Uh oh!

mattip commented Nov 13, 2018

Uh oh!

Uh oh!

BUG: Fix `np.load` for aligned dtypes. #10931

BUG: Fix `np.load` for aligned dtypes. #10931

ihormelnyk commented Apr 19, 2018 •

edited by eric-wieser

Loading

ahaldane Apr 19, 2018 •

edited

Loading

ahaldane Apr 19, 2018 •

edited

Loading