Skip to content

Using open_memmap with shape tuple can create npys that are not loadable when tuple contains np.ints #28334

@jungerm2

Description

@jungerm2

Describe the issue:

Passing a shape tuple to open_memmap that contains np.int64 instead of ints does not throw any errors and writes the array to disk without any issues, except that np.load fails to load it.

Specifically, the npy file will start with the following bytes (non-ascii chars removed):

NUMPY v {'descr': '|u1', 'fortran_order': False, 'shape': (np.int64(1), np.int64(2), np.int64(3), np.int64(4)), }

as opposed to:

NUMPY  v {'descr': '|u1', 'fortran_order': False, 'shape': (1, 2, 3, 4), } 

and it seems np.load fails on this as it does a ast.literal_eval on this header and thus cannot deserialize the np.int64()'s.

While the open_memmap docs correctly states that shape should be a tuple of ints, I think that either this should be enforced by raising an error if the type is wrong, or they should be converted to simple ints which would allow loading. This might be an open_memmap problem exclusively, but it might make sense to allow np.load to read headers with np.integer types. At the moment the write succeeds while creating an unusable npy.

Reproduce the code example:

import numpy as np
from numpy.lib.format import open_memmap
shape = np.array([1, 2, 3, 4])
data = open_memmap("data.npy", mode="w+", dtype=np.uint8, shape=tuple(shape))
data[:] = np.ones(shape, dtype=np.uint8)
np.load("data.npy")

Error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[9], line 1
----> 1 np.load("data.npy")

File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/_npyio_impl.py:484, in load(file, mmap_mode, allow_pickle, fix_imports, encoding, max_header_size)
    481         return format.open_memmap(file, mode=mmap_mode,
    482                                   max_header_size=max_header_size)
    483     else:
--> 484         return format.read_array(fid, allow_pickle=allow_pickle,
    485                                  pickle_kwargs=pickle_kwargs,
    486                                  max_header_size=max_header_size)
    487 else:
    488     # Try a pickle
    489     if not allow_pickle:

File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/format.py:811, in read_array(fp, allow_pickle, pickle_kwargs, max_header_size)
    809 version = read_magic(fp)
    810 _check_version(version)
--> 811 shape, fortran_order, dtype = _read_array_header(
    812         fp, version, max_header_size=max_header_size)
    813 if len(shape) == 0:
    814     count = 1

File ~/micromamba/envs/py39/lib/python3.9/site-packages/numpy/lib/format.py:644, in _read_array_header(fp, version, max_header_size)
    633 # The header is a pretty-printed string representation of a literal
    634 # Python dictionary with trailing newlines padded to a ARRAY_ALIGN byte
    635 # boundary. The keys are strings.
   (...)
    641 #
    642 # For performance reasons, we try without _filter_header first though
    643 try:
--> 644     d = ast.literal_eval(header)
    645 except SyntaxError as e:
    646     if version <= (2, 0):

File ~/micromamba/envs/py39/lib/python3.9/ast.py:107, in literal_eval(node_or_string)
    105                 return left - right
    106     return _convert_signed_num(node)
--> 107 return _convert(node_or_string)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:96, in literal_eval.<locals>._convert(node)
     94     if len(node.keys) != len(node.values):
     95         _raise_malformed_node(node)
---> 96     return dict(zip(map(_convert, node.keys),
     97                     map(_convert, node.values)))
     98 elif isinstance(node, BinOp) and isinstance(node.op, (Add, Sub)):
     99     left = _convert_signed_num(node.left)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:85, in literal_eval.<locals>._convert(node)
     83     return node.value
     84 elif isinstance(node, Tuple):
---> 85     return tuple(map(_convert, node.elts))
     86 elif isinstance(node, List):
     87     return list(map(_convert, node.elts))

File ~/micromamba/envs/py39/lib/python3.9/ast.py:106, in literal_eval.<locals>._convert(node)
    104         else:
    105             return left - right
--> 106 return _convert_signed_num(node)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:80, in literal_eval.<locals>._convert_signed_num(node)
     78     else:
     79         return - operand
---> 80 return _convert_num(node)

File ~/micromamba/envs/py39/lib/python3.9/ast.py:71, in literal_eval.<locals>._convert_num(node)
     69 def _convert_num(node):
     70     if not isinstance(node, Constant) or type(node.value) not in (int, float, complex):
---> 71         _raise_malformed_node(node)
     72     return node.value

File ~/micromamba/envs/py39/lib/python3.9/ast.py:68, in literal_eval.<locals>._raise_malformed_node(node)
     67 def _raise_malformed_node(node):
---> 68     raise ValueError(f'malformed node or string: {node!r}')

ValueError: malformed node or string: <ast.Call object at 0x7f06b1d31a60>

Python and NumPy Versions:

Tested with python 3.9 and numpy 2.02 as well as python 3.12 and numpy 2.2.3

Runtime Environment:

[{'numpy_version': '2.0.2',
'python': '3.9.21 | packaged by conda-forge | (main, Dec 5 2024, '
'13:51:40) \n'
'[GCC 13.3.0]',
'uname': uname_result(system='Linux', node='fedora', release='6.12.11-200.fc41.x86_64', version='#1 SMP PREEMPT_DYNAMIC Fri Jan 24 04:59:58 UTC 2025', machine='x86_64')},
{'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3'],
'found': ['SSSE3',
'SSE41',
'POPCNT',
'SSE42',
'AVX',
'F16C',
'FMA3',
'AVX2',
'AVX512F',
'AVX512CD',
'AVX512_SKX',
'AVX512_CLX',
'AVX512_CNL',
'AVX512_ICL'],
'not_found': ['AVX512_KNL', 'AVX512_KNM']}},
{'architecture': 'Cooperlake',
'filepath': '/home/sjung/micromamba/envs/py39/lib/python3.9/site-packages/numpy.libs/libscipy_openblas64_-99b71e71.so',
'internal_api': 'openblas',
'num_threads': 24,
'prefix': 'libscipy_openblas',
'threading_layer': 'pthreads',
'user_api': 'blas',
'version': '0.3.27'}]

Context for the issue:

Silent failure causes unreadable npys to be created, which caused me data loss (or manual header re-write).

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions