-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
BUG: Fix np.load
for aligned dtypes.
#10931
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
old_header = header | ||
# replace paddings like ('', '|V4') | ||
header = re.sub(r"\s*,*\s*\(\s*''\s*,\s*'\|*V\d'\s*\)", '', header) | ||
aligned = header != old_header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, this test assumes that the pressence of padding-fields implies an aligned dtype. but actually there are many ways to create a dtype which has padding fields yet which is not "aligned".
For instance, try saving and loading the array
>>> a = np.zeros(3, dtype={'names': ['a', 'b'], 'formats': ['i4', 'i4'], 'offsets': [1, 7]})
>>> np.save('test', a)
>>> np.load('test.npy')
Also, I think we should avoid regular expression processing if possible, I am worried it is too fragile. It would be better to turn the string into a list of tuples first using save_eval, and then do some computations on the list of tuples to get the right offsets, names, dtypes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to save npy where fields in dtype have empty names?
It will be more complex code to rebuild list of tuples
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code already there rebuilds the list of tuples right now, using safe_eval
: You can access it in d['descr']
, as is done a few lines below in the file.
What I am suggesting is adding some code like:
def descr_to_dtype(descr):
if isisinstance(descr, str):
return np.dtype(descr) # Question: Is this guaranteed to work?
fields = []
offset = 0
for name, descr_i in d['descr']:
dt = process_descr(descr_i)
if name == '': # skip padding
assert(dt.type is np.void and dt.names is None)
else:
fields.append((name, dt, offset))
offset += dt.itemsize
names, formats, offsets = zip(*fields)
return np.dtype('names': names, 'formats': formats,
'offsets': offsets, 'itemsize': offset)
dtype = descr_to_dtype(d['descr'])
(completely untested, just to give an idea)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it technically is possible to create a dtype with empty field, but only using the dict specification:
np.dtype({'names': [''], 'formats': 'i4'})
(see this comment)
I think that if we want to fix up the save/load code here, we may want to disallow that possibility, to avoid confusion when such an array can't be saved (after this PR).
Also, you may not have seen it, but there is an ongoing discussion starting here (link) about exactly how to fix this bug. In this PR you are going with what I called option 2 in a comment there. I think that is probably the easiest, and possibly best, thing to do. That is, we should continue storing the dtype using the @mhvk also suggested that we "pack" the fields before saving to file, but that is an extra detail that we can do in a separate PR, if desired. |
np.load
for aligned dtypes.
@ihormelnyk do you think you will continue with this PR? It seems we need to fix this in order to move on with some other cleanups |
Hi @mattip , |
there were suggestions for fixes to this PR that you have not yet responded to |
I think, I'll be able to look into it on Monday |
Hi @mattip , @ahaldane
Aligned dtype is very important for our computation framework, due to we use it to pass large data structs to python C extension (to avoid time consuming copying) and further calculation on the GPU. P.S. But again we can't upgrade version of numpy for clients higher than 1.13.3 due to "Multiple-field indexing/assignment of structured arrays" changes in 1.14. |
@ihormelnyk I opened a new PR #12358 to replace this one. It is not quite this fix since it does not assume the dtype should be aligned, that information is simply lost in the storage format. I would like to close this one in favor of continuing the conversation there. |
Closing in favor of #12358 |
Fixes gh-2215