BUG: read_array_header_1_0 errors if read_magic has not been called first #29159

emlys · 2025-06-09T20:35:12Z

Describe the issue:

Attempting to read the array header of a .npy file raises an error. If the version is read first with read_magic, there is no error. It seems like maybe the file object is not being advanced to the right location before reading.

Reproduce the code example:

import numpy as np

arr = np.array([1, 2, 3])
np.save('test.npy', arr)
with open('test.npy', 'rb+') as file:
    # if this line is uncommented, the error is not raised
    # np.lib.format.read_magic(file)
    np.lib.format.read_array_header_1_0(file)

Error message:

Traceback (most recent call last):
  File "<python-input-0>", line 8, in <module>
    np.lib.format.read_array_header_1_0(file)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 529, in read_array_header_1_0
    return _read_array_header(
            fp, version=(1, 0), max_header_size=max_header_size)
  File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 619, in _read_array_header
    header = _read_bytes(fp, header_length, "array header")
  File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 994, in _read_bytes
    raise ValueError(msg % (error_template, size, len(data)))
ValueError: EOF: reading array header, expected 20115 bytes got 150

Python and NumPy Versions:

2.2.6
3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:46:04) [Clang 18.1.8 ]

Runtime Environment:

[{'numpy_version': '2.2.6',
  'python': '3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:46:04) '
            '[Clang 18.1.8 ]',
  'uname': uname_result(system='Darwin', node='Emilys-MacBook-Pro-2.local', release='22.3.0', version='Darwin Kernel Version 22.3.0: Mon Jan 30 20:42:11 PST 2023; root:xnu-8792.81.3~2/RELEASE_X86_64', machine='x86_64')},
 {'simd_extensions': {'baseline': ['SSE', 'SSE2', 'SSE3', 'SSSE3'],
                      'found': ['SSE41',
                                'POPCNT',
                                'SSE42',
                                'AVX',
                                'F16C',
                                'FMA3',
                                'AVX2',
                                'AVX512F',
                                'AVX512CD',
                                'AVX512_SKX',
                                'AVX512_CLX',
                                'AVX512_CNL',
                                'AVX512_ICL'],
                      'not_found': ['AVX512_KNL']}},
 {'architecture': 'Haswell',
  'filepath': '/Users/emily/miniforge3/envs/main/lib/libopenblasp-r0.3.29.dylib',
  'internal_api': 'openblas',
  'num_threads': 4,
  'prefix': 'libopenblas',
  'threading_layer': 'openmp',
  'user_api': 'blas',
  'version': '0.3.29'},
 {'filepath': '/Users/emily/miniforge3/envs/main/lib/libomp.dylib',
  'internal_api': 'openmp',
  'num_threads': 8,
  'prefix': 'libomp',
  'user_api': 'openmp',
  'version': None}]

Context for the issue:

We had previously used numpy.lib.format._read_array_header to support both file format versions 1 and 2. This function is unavailable starting in numpy 2.3.0 so we must switch to explicitly calling read_array_header_1_0 or read_array_header_2_0. In my opinion, the _read_array_header API would be useful to make public.

If we can assume that version 1 is being used (because we are not using structured arrays), I think read_magic should not have to be called first. Or if it is actually necessary to call read_magic first, this should be documented.

The text was updated successfully, but these errors were encountered:

seberg · 2025-06-10T09:12:20Z

We could probably revert hiding that if it helps a lot, but it was underscored and explicitly calling one of the two version is how things are designed (otherwise, read_magic would have to peek or seek the file, that is probably fine, but I am not 100% sure)?

Maybe the issue is more to document this if you really need it? If we ever add a version 3, I am not sure that we can promise that a read_array_header() function should return the exact same things, although, I suppose you can argue that if it doesn't you could raise an error and introduce a new version then.

Note that your old code also must have had the read_magic() to pass the version to _read_array_header, hard-coding a version there would have failed exactly the same way.

rkern · 2025-06-10T13:46:43Z

It's working as designed and desired. The docstring could be updated to make the constraint clear.

emlys added the 00 - Bug label Jun 9, 2025

rkern added 04 - Documentation and removed 00 - Bug labels Jun 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: read_array_header_1_0 errors if read_magic has not been called first #29159

BUG: read_array_header_1_0 errors if read_magic has not been called first #29159

emlys commented Jun 9, 2025 •

edited

Loading

seberg commented Jun 10, 2025

Uh oh!

rkern commented Jun 10, 2025

Uh oh!

Uh oh!

BUG: read_array_header_1_0 errors if read_magic has not been called first #29159

BUG: read_array_header_1_0 errors if read_magic has not been called first #29159

Comments

emlys commented Jun 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe the issue:

Reproduce the code example:

Error message:

Python and NumPy Versions:

Runtime Environment:

Context for the issue:

seberg commented Jun 10, 2025

Uh oh!

rkern commented Jun 10, 2025

Uh oh!

emlys commented Jun 9, 2025 •

edited

Loading