You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Attempting to read the array header of a .npy file raises an error. If the version is read first with read_magic, there is no error. It seems like maybe the file object is not being advanced to the right location before reading.
Reproduce the code example:
importnumpyasnparr=np.array([1, 2, 3])
np.save('test.npy', arr)
withopen('test.npy', 'rb+') asfile:
# if this line is uncommented, the error is not raised# np.lib.format.read_magic(file)np.lib.format.read_array_header_1_0(file)
Error message:
Traceback (most recent call last):
File "<python-input-0>", line 8, in<module>
np.lib.format.read_array_header_1_0(file)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 529, in read_array_header_1_0
return _read_array_header(
fp, version=(1, 0), max_header_size=max_header_size)
File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 619, in _read_array_header
header = _read_bytes(fp, header_length, "array header")
File "/Users/emily/miniforge3/envs/main/lib/python3.13/site-packages/numpy/lib/format.py", line 994, in _read_bytes
raise ValueError(msg % (error_template, size, len(data)))
ValueError: EOF: reading array header, expected 20115 bytes got 150
We had previously used numpy.lib.format._read_array_header to support both file format versions 1 and 2. This function is unavailable starting in numpy 2.3.0 so we must switch to explicitly calling read_array_header_1_0 or read_array_header_2_0. In my opinion, the _read_array_header API would be useful to make public.
If we can assume that version 1 is being used (because we are not using structured arrays), I think read_magic should not have to be called first. Or if it is actually necessary to call read_magic first, this should be documented.
The text was updated successfully, but these errors were encountered:
We could probably revert hiding that if it helps a lot, but it was underscored and explicitly calling one of the two version is how things are designed (otherwise, read_magic would have to peek or seek the file, that is probably fine, but I am not 100% sure)?
Maybe the issue is more to document this if you really need it? If we ever add a version 3, I am not sure that we can promise that a read_array_header() function should return the exact same things, although, I suppose you can argue that if it doesn't you could raise an error and introduce a new version then.
Note that your old code also must have had the read_magic() to pass the version to _read_array_header, hard-coding a version there would have failed exactly the same way.
Uh oh!
There was an error while loading. Please reload this page.
Describe the issue:
Attempting to read the array header of a .npy file raises an error. If the version is read first with
read_magic
, there is no error. It seems like maybe the file object is not being advanced to the right location before reading.Reproduce the code example:
Error message:
Python and NumPy Versions:
2.2.6
3.13.3 | packaged by conda-forge | (main, Apr 14 2025, 20:46:04) [Clang 18.1.8 ]
Runtime Environment:
Context for the issue:
We had previously used
numpy.lib.format._read_array_header
to support both file format versions 1 and 2. This function is unavailable starting in numpy 2.3.0 so we must switch to explicitly callingread_array_header_1_0
orread_array_header_2_0
. In my opinion, the_read_array_header
API would be useful to make public.If we can assume that version 1 is being used (because we are not using structured arrays), I think
read_magic
should not have to be called first. Or if it is actually necessary to callread_magic
first, this should be documented.The text was updated successfully, but these errors were encountered: