Skip to content

Failure to load .npy file with structured dtype and metadata #14142

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
colingavin opened this issue Jul 28, 2019 · 37 comments · Fixed by #27143
Closed

Failure to load .npy file with structured dtype and metadata #14142

colingavin opened this issue Jul 28, 2019 · 37 comments · Fixed by #27143

Comments

@colingavin
Copy link

colingavin commented Jul 28, 2019

The dtype description in an NPY file can be written in a way that prevents the file from being read. The attached array in bad_array.pkl.zip exhibits this problem. In this case the written dtype is [('id', '|V16'), ('has_fit', '|b1'), ('timeax', ('|O', {'vlen': dtype('float64')})), ('', '|V8'), ('profile', ('|O', {'vlen': dtype('float64')})), ('', '|V8'), ('area', '<f8')].

(Note that it was obtained from via h5py - if this is an issue with the format of the arrays produced by that library, I'm happy to file a bug there instead.)

Reproducing code example:

import numpy as np
import pickle
import io

with open('bad_array.pkl', 'rb') as f:
    arr = pickle.load(f)

buf = io.BytesIO()
np.save(buf, arr)
buf.seek(0)
np.load(buf)

Error message:

Traceback (most recent call last):
  File "scripts/badnpy.py", line 11, in <module>
    np.load(buf)
  File "/Users/colingavin/Code/908/clarent2/algo/py/.venv/lib/python3.7/site-packages/numpy/lib/npyio.py", line 453, in load
    pickle_kwargs=pickle_kwargs)
  File "/Users/colingavin/Code/908/clarent2/algo/py/.venv/lib/python3.7/site-packages/numpy/lib/format.py", line 712, in read_array
    shape, fortran_order, dtype = _read_array_header(fp, version)
  File "/Users/colingavin/Code/908/clarent2/algo/py/.venv/lib/python3.7/site-packages/numpy/lib/format.py", line 578, in _read_array_header
    d = safe_eval(header)
  File "/Users/colingavin/Code/908/clarent2/algo/py/.venv/lib/python3.7/site-packages/numpy/lib/utils.py", line 1139, in safe_eval
    return ast.literal_eval(source)
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 91, in literal_eval
    return _convert(node_or_string)
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 79, in _convert
    map(_convert, node.values)))
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 74, in _convert
    return list(map(_convert, node.elts))
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 72, in _convert
    return tuple(map(_convert, node.elts))
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 72, in _convert
    return tuple(map(_convert, node.elts))
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 79, in _convert
    map(_convert, node.values)))
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 90, in _convert
    return _convert_signed_num(node)
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 63, in _convert_signed_num
    return _convert_num(node)
  File "/Users/colingavin/.pyenv/versions/3.7.2/Python.framework/Versions/3.7/lib/python3.7/ast.py", line 55, in _convert_num
    raise ValueError('malformed node or string: ' + repr(node))
ValueError: malformed node or string: <_ast.Call object at 0x10b965a58>

Numpy/Python version information:

1.17.0 3.7.2 (default, Jul 14 2019, 10:22:38)
[Clang 9.0.0 (clang-900.0.39.2)]

@eric-wieser
Copy link
Member

eric-wieser commented Jul 28, 2019

Loads just fine on 1.15.4 + 64-bit windows, looks like a regression.

Also loads fine on 1.16.0.dev0+d14632c

@eric-wieser eric-wieser changed the title Failure to load .npy file with complex dtype Failure to load .npy file with structured dtype Jul 28, 2019
@seberg
Copy link
Member

seberg commented Jul 28, 2019

1.17.x and 1.16.2 I am seeing a No error set when coercing the dtype.

@eric-wieser
Copy link
Member

No issues on 1.18.0.dev0+84e8412 + python 3.5 either. Perhaps only the release has the problem, or perhaps it's a 3.7 issue

@seberg
Copy link
Member

seberg commented Jul 28, 2019

Oh, hmmm, maybe then it is a python version related thing, interesting. I was testing with 3.7:

np.dtype([('id', '|V16'), ('has_fit', '|b1'), ('timeax', ('|O', {'vlen': np.dtype('float64')})), ('', '|V8'), ('profile', ('|O', {'vlen': np.dtype('float64')})), ('', '|V8'), ('area', '<f8')])

resulting in no error set returns.

@eric-wieser
Copy link
Member

eric-wieser commented Jul 28, 2019

On 3.5 and 84e8412, that gives ValueError: invalid shape in fixed-type tuple. For some reason the pickle does not cause that error

@seberg
Copy link
Member

seberg commented Jul 28, 2019

Oh, interesting, on python 2.7, I get a simple invalid datatype error, on 3.7 I get "no error set", I had not tested the pickle, I can reproduce the error on python 3.7 both with 16.2 and master.

@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Jul 28, 2019
@eric-wieser
Copy link
Member

eric-wieser commented Jul 28, 2019

My hunch is the cause is this line:

PyObject_Print(PyTuple_GET_ITEM(item, 1), stderr, 0);

which calls print while an exception is in flight.

I have no idea why that print is there

@eric-wieser
Copy link
Member

Yep, that fixed it

eric-wieser added a commit to eric-wieser/numpy that referenced this issue Jul 28, 2019
…o an exception being in flight

We shouldn't be reporting errors via print anyway

Related to numpygh-14142
@seberg
Copy link
Member

seberg commented Jul 29, 2019

Ah, nice, thanks. Guess I can close this then.

@seberg seberg closed this as completed Jul 29, 2019
@eric-wieser
Copy link
Member

Don't think we can close this, the problem I solved is the one you discovered, not the one this issue reports

@eric-wieser eric-wieser reopened this Jul 29, 2019
@seberg
Copy link
Member

seberg commented Jul 29, 2019

Ohh, sorry. Somewhat thought those were the same, heh...

charris pushed a commit to charris/numpy that referenced this issue Jul 29, 2019
…o an exception being in flight

We shouldn't be reporting errors via print anyway

Related to numpygh-14142
charris pushed a commit to charris/numpy that referenced this issue Jul 30, 2019
…o an exception being in flight

We shouldn't be reporting errors via print anyway

Related to numpygh-14142
@charris
Copy link
Member

charris commented Aug 20, 2019

Pushing off to 1.17.2

@charris
Copy link
Member

charris commented Nov 16, 2021

Well, heck. Pushed it off again.

@seberg
Copy link
Member

seberg commented May 4, 2022

Will push this one of (as discussed also in the triage meeting), but I am not sure how important this issue is still/now, so please ping to bump it if necessary.

@charris
Copy link
Member

charris commented Aug 1, 2024

@tacaswell Is this still a problem?

@tacaswell
Copy link
Contributor

Yes, but now with an extra warning:

In [2]: import numpy as np
   ...: import io
   ...:
   ...:
   ...: dt = np.dtype({'names': ['a', 'b'], 'formats':  [float, np.dtype('S3', metadata={'some': 'stuff'})]})
   ...:
   ...: arr = np.array([(1, 'abc'), (2, 'def')], dtype=dt)
   ...: buf = io.BytesIO()
   ...: np.save(buf, arr)
   ...: buf.seek(0)
   ...: np.load(buf)
/home/tcaswell/.virtualenvs/cp313/lib/python3.13/site-packages/numpy/lib/format.py:380: UserWarning: metadata on a dtype is not saved to an npy/npz. Use another format (such as pickle) to store it.
  d['descr'] = dtype_to_descr(array.dtype)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[2], line 11
      9 np.save(buf, arr)
     10 buf.seek(0)
---> 11 np.load(buf)

File ~/.virtualenvs/cp313/lib/python3.13/site-packages/numpy/lib/_npyio_impl.py:488, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
    485         return format.open_memmap(file, mode=mmap_mode,
    486                                   max_header_size=max_header_size)
    487     else:
--> 488         return format.read_array(fid, allow_pickle=allow_pickle,
    489                                  pickle_kwargs=pickle_kwargs,
    490                                  max_header_size=max_header_size)
    491 else:
    492     # Try a pickle
    493     if not allow_pickle:

File ~/.virtualenvs/cp313/lib/python3.13/site-packages/numpy/lib/format.py:809, in read_array(fp, allow_pickle, pickle_kwargs, max_header_size)
    807 version = read_magic(fp)
    808 _check_version(version)
--> 809 shape, fortran_order, dtype = _read_array_header(
    810         fp, version, max_header_size=max_header_size)
    811 if len(shape) == 0:
    812     count = 1

File ~/.virtualenvs/cp313/lib/python3.13/site-packages/numpy/lib/format.py:678, in _read_array_header(fp, version, max_header_size)
    676     raise ValueError(msg.format(d['fortran_order']))
    677 try:
--> 678     dtype = descr_to_dtype(d['descr'])
    679 except TypeError as e:
    680     msg = "descr is not a valid dtype descriptor: {!r}"

File ~/.virtualenvs/cp313/lib/python3.13/site-packages/numpy/lib/format.py:337, in descr_to_dtype(descr)
    335 if len(field) == 2:
    336     name, descr_str = field
--> 337     dt = descr_to_dtype(descr_str)
    338 else:
    339     name, descr_str, shape = field

File ~/.virtualenvs/cp313/lib/python3.13/site-packages/numpy/lib/format.py:327, in descr_to_dtype(descr)
    324 elif isinstance(descr, tuple):
    325     # subtype, will always have a shape descr[1]
    326     dt = descr_to_dtype(descr[0])
--> 327     return numpy.dtype((dt, descr[1]))
    329 titles = []
    330 names = []

ValueError: invalid shape in fixed-type tuple.

In [3]: np.__version__
Out[3]: '2.1.0.dev0+git20240725.0819378'

@seberg
Copy link
Member

seberg commented Aug 1, 2024

Hmmmm, I somehow thought I once wrote stripping logic, so that it would drop but load. I guess some half-finished PR somewhere.

@seberg
Copy link
Member

seberg commented Aug 1, 2024

Yeah, #23371 should cause all metdata to be dropped, so it isn't clear to me why it cannot be loaded. It would seem that there must be some whole in the logic.

@seberg
Copy link
Member

seberg commented Aug 8, 2024

Haha, the change was fine... The problem is that the code just failed to actually use the cleaned up version...

seberg added a commit to seberg/numpy that referenced this issue Aug 8, 2024
We had logic in place to drop (most) metadata, but the change
had a small bug: During saving, we were still using the one with
metadata...

Maybe doesn't quite close it, but big enough of an improvement
for now, I think, so

Closes numpygh-14142
@charris charris closed this as completed in b9bcca0 Aug 8, 2024
charris pushed a commit to charris/numpy that referenced this issue Aug 8, 2024
We had logic in place to drop (most) metadata, but the change
had a small bug: During saving, we were still using the one with
metadata...

Maybe doesn't quite close it, but big enough of an improvement
for now, I think, so

Closes numpygh-14142
charris pushed a commit to charris/numpy that referenced this issue Aug 9, 2024
We had logic in place to drop (most) metadata, but the change
had a small bug: During saving, we were still using the one with
metadata...

Maybe doesn't quite close it, but big enough of an improvement
for now, I think, so

Closes numpygh-14142
ArvidJB pushed a commit to ArvidJB/numpy that referenced this issue Nov 1, 2024
We had logic in place to drop (most) metadata, but the change
had a small bug: During saving, we were still using the one with
metadata...

Maybe doesn't quite close it, but big enough of an improvement
for now, I think, so

Closes numpygh-14142
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants