Skip to content

BUG: read_array_header_1_0 errors if read_magic has not been called first #29159

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
emlys opened this issue Jun 9, 2025 · 2 comments
Open

Comments

@emlys
Copy link

emlys commented Jun 9, 2025

Describe the issue:

Attempting to read the array header of a .npy file raises an error. If the version is read first with read_magic, there is no error. It seems like maybe the file object is not being advanced to the right location before reading.

Reproduce the code example:

import numpy as np

arr = np.array([1, 2, 3])
np.save('test.npy', arr)
with open('test.npy', 'rb+') as file:
    # if this line is uncommented, the error is not raised
    # np.lib.format.read_magic(file)
    np.lib.format.read_array_header_1_0(file)

Error message:

Traceback (most recent call last):
  File "<python-input-0>", line 8, in <module>
    np.lib.format.read_array_header_1_0(file)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 529, in read_array_header_1_0
    return _read_array_header(
            fp, version=(1, 0), max_header_size=max_header_size)
  File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 619, in _read_array_header
    header = _read_bytes(fp, header_length, "array header")
  File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 994, in _read_bytes
    raise ValueError(msg % (error_template, size, len(data)))
ValueError: EOF: reading array header, expected 20115 bytes got 150

Python and NumPy Versions:

2.2.6
3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:46:04) [Clang 18.1.8 ]

Runtime Environment:

[{'numpy_version': '2.2.6',
  'python': '3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:46:04) '
            '[Clang 18.1.8 ]',
  'uname': uname_result(system='Darwin', node='Emilys-MacBook-Pro-2.local', release='22.3.0', version='Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3', 'SSSE3'],
                      'found': ['SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2',
                                'AVX512F',
                                'AVX512CD',
                                'AVX512_SKX',
                                'AVX512_CLX',
                                'AVX512_CNL',
                                'AVX512_ICL'],
                      'not_found': ['AVX512_KNL']}},
 {'architecture': 'Haswell',
  'filepath': '/Users/emily/miniforge3/envs/main/lib/libopenblasp-r0.3.29.dylib',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'openmp',
  'user_api': 'blas',
  'version': '0.3.29'},
 {'filepath': '/Users/emily/miniforge3/envs/main/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 8,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None}]

Context for the issue:

We had previously used numpy.lib.format._read_array_header to support both file format versions 1 and 2. This function is unavailable starting in numpy 2.3.0 so we must switch to explicitly calling read_array_header_1_0 or read_array_header_2_0. In my opinion, the _read_array_header API would be useful to make public.

If we can assume that version 1 is being used (because we are not using structured arrays), I think read_magic should not have to be called first. Or if it is actually necessary to call read_magic first, this should be documented.

@emlys emlys added the 00 - Bug label Jun 9, 2025
@seberg
Copy link
Member

seberg commented Jun 10, 2025

We could probably revert hiding that if it helps a lot, but it was underscored and explicitly calling one of the two version is how things are designed (otherwise, read_magic would have to peek or seek the file, that is probably fine, but I am not 100% sure)?

Maybe the issue is more to document this if you really need it? If we ever add a version 3, I am not sure that we can promise that a read_array_header() function should return the exact same things, although, I suppose you can argue that if it doesn't you could raise an error and introduce a new version then.

Note that your old code also must have had the read_magic() to pass the version to _read_array_header, hard-coding a version there would have failed exactly the same way.

@rkern
Copy link
Member

rkern commented Jun 10, 2025

It's working as designed and desired. The docstring could be updated to make the constraint clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants