-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
Description
Describe the issue:
Passing a shape tuple to open_memmap that contains np.int64 instead of ints does not throw any errors and writes the array to disk without any issues, except that np.load
fails to load it.
Specifically, the npy file will start with the following bytes (non-ascii chars removed):
NUMPY v {'descr': '|u1', 'fortran_order': False, 'shape': (np.int64(1), np.int64(2), np.int64(3), np.int64(4)), }
as opposed to:
NUMPY v {'descr': '|u1', 'fortran_order': False, 'shape': (1, 2, 3, 4), }
and it seems np.load
fails on this as it does a ast.literal_eval
on this header and thus cannot deserialize the np.int64()
's.
While the open_memmap
docs correctly states that shape should be a tuple of ints, I think that either this should be enforced by raising an error if the type is wrong, or they should be converted to simple ints which would allow loading. This might be an open_memmap problem exclusively, but it might make sense to allow np.load to read headers with np.integer types. At the moment the write succeeds while creating an unusable npy.
Reproduce the code example:
import numpy as np
from numpy.lib.format import open_memmap
shape = np.array([1, 2, 3, 4])
data = open_memmap("data.npy", mode="w+", dtype=np.uint8, shape=tuple(shape))
data[:] = np.ones(shape, dtype=np.uint8)
np.load("data.npy")
Error message:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[9], line 1
----> 1 np.load("data.npy")
File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/_npyio_impl.py:484, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
481 return format.open_memmap(file, mode=mmap_mode,
482 max_header_size=max_header_size)
483 else:
--> 484 return format.read_array(fid, allow_pickle=allow_pickle,
485 pickle_kwargs=pickle_kwargs,
486 max_header_size=max_header_size)
487 else:
488 # Try a pickle
489 if not allow_pickle:
File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/format.py:811, in read_array(fp, allow_pickle, pickle_kwargs, max_header_size)
809 version = read_magic(fp)
810 _check_version(version)
--> 811 shape, fortran_order, dtype = _read_array_header(
812 fp, version, max_header_size=max_header_size)
813 if len(shape) == 0:
814 count = 1
File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/format.py:644, in _read_array_header(fp, version, max_header_size)
633 # The header is a pretty-printed string representation of a literal
634 # Python dictionary with trailing newlines padded to a ARRAY_ALIGN byte
635 # boundary. The keys are strings.
(...)
641 #
642 # For performance reasons, we try without _filter_header first though
643 try:
--> 644 d = ast.literal_eval(header)
645 except SyntaxError as e:
646 if version <= (2, 0):
File ~/micromamba/envs/py39/lib/python3.9/ast.py:107, in literal_eval(node_or_string)
105 return left - right
106 return _convert_signed_num(node)
--> 107 return _convert(node_or_string)
File ~/micromamba/envs/py39/lib/python3.9/ast.py:96, in literal_eval.<locals>._convert(node)
94 if len(node.keys) != len(node.values):
95 _raise_malformed_node(node)
---> 96 return dict(zip(map(_convert, node.keys),
97 map(_convert, node.values)))
98 elif isinstance(node, BinOp) and isinstance(node.op, (Add, Sub)):
99 left = _convert_signed_num(node.left)
File ~/micromamba/envs/py39/lib/python3.9/ast.py:85, in literal_eval.<locals>._convert(node)
83 return node.value
84 elif isinstance(node, Tuple):
---> 85 return tuple(map(_convert, node.elts))
86 elif isinstance(node, List):
87 return list(map(_convert, node.elts))
File ~/micromamba/envs/py39/lib/python3.9/ast.py:106, in literal_eval.<locals>._convert(node)
104 else:
105 return left - right
--> 106 return _convert_signed_num(node)
File ~/micromamba/envs/py39/lib/python3.9/ast.py:80, in literal_eval.<locals>._convert_signed_num(node)
78 else:
79 return - operand
---> 80 return _convert_num(node)
File ~/micromamba/envs/py39/lib/python3.9/ast.py:71, in literal_eval.<locals>._convert_num(node)
69 def _convert_num(node):
70 if not isinstance(node, Constant) or type(node.value) not in (int, float, complex):
---> 71 _raise_malformed_node(node)
72 return node.value
File ~/micromamba/envs/py39/lib/python3.9/ast.py:68, in literal_eval.<locals>._raise_malformed_node(node)
67 def _raise_malformed_node(node):
---> 68 raise ValueError(f'malformed node or string: {node!r}')
ValueError: malformed node or string: <ast.Call object at 0x7f06b1d31a60>
Python and NumPy Versions:
Tested with python 3.9 and numpy 2.02 as well as python 3.12 and numpy 2.2.3
Runtime Environment:
[{'numpy_version': '2.0.2',
'python': '3.9.21 | packaged by conda-forge | (main, Dec 5 2024, '
'13:51:40) \n'
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='fedora', release='6.12.11-200.fc41.x86_64', version='#1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'Cooperlake',
'filepath': '/home/sjung/micromamba/envs/py39/lib/python3.9/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]
Context for the issue:
Silent failure causes unreadable npys to be created, which caused me data loss (or manual header re-write).