Skip to content

genfromtxt fails when a non-contiguous dtype is requested #19623

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
anntzer opened this issue Aug 6, 2021 · 1 comment
Open

genfromtxt fails when a non-contiguous dtype is requested #19623

anntzer opened this issue Aug 6, 2021 · 1 comment

Comments

@anntzer
Copy link
Contributor

anntzer commented Aug 6, 2021

Reproducing code example:

import numpy as np, io
# (np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]]) constructs a non-contiguous dtype with
# the a and c fields)
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)]))  # OK
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]])  # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)]))  # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]])  # fails

Error message:

/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding, like)
   2220             else:
   2221                 rows = np.array(data, dtype=[('', _) for _ in dtype_flat])
-> 2222                 output = rows.view(dtype)
   2223             # Now, process the rowmasks the same way
   2224             if usemask:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

This occurs because genfromtxt handles dtypes by first "flattening" them (lifting nested fields to the toplevel, but also throwing away alignment info), constructing an array with that flattened dtype, and then .view()ing it with the original dtype; it is that last step that fails as the view cannot be done as the memory layout changes.

(OTOH, loadtxt directly constructs an output with the right dtype by using a recursive "row packer" that constructs a nested list/tuple with the right shape.)

I was hoping to implement a similar "flat-dtype" optimization for loadtxt, so that'll likely involve alignment-handling code on both sides.

... or can we just claim that non-contiguous dtypes are not supported by loadtxt/genfromtxt? (I expect the use cases of having to loadtxt() into an array with a specific, non-contiguous alignment to be exceedingly rare; you can always to a copy later if needed, which should have negligible cost compared to the (rather slow) loadtxt.)

NumPy/Python version information:

1.21.1 3.9.6 (default, Jun 30 2021, 10:22:16) 
[GCC 11.1.0]
@seberg
Copy link
Member

seberg commented Aug 11, 2021

... or can we just claim that non-contiguous dtypes are not supported by loadtxt/genfromtxt?

I think this is fine, its super obscure. The only thing that might be nice would be non-packed structured dtypes, but I am not sure we should even worry about that too much. While useful, in practice also rare (and especially rarely used).

EDIT: Although, maybe for unflattened dtypes as in your example it would be nice (still low priority IMO).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants