genfromtxt fails when a non-contiguous dtype is requested #19623

anntzer · 2021-08-06T17:40:03Z

Reproducing code example:

import numpy as np, io
# (np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]]) constructs a non-contiguous dtype with
# the a and c fields)
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)]))  # OK
np.loadtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]])  # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("c", int)]))  # OK
np.genfromtxt(io.StringIO("1 3\n5 7"), dtype=np.dtype([("a", int), ("b", float), ("c", int)])[["a", "c"]])  # fails

Error message:

/usr/lib/python3.9/site-packages/numpy/lib/npyio.py in genfromtxt(fname, dtype, comments, delimiter, skip_header, skip_footer, converters, missing_values, filling_values, usecols, names, excludelist, deletechars, replace_space, autostrip, case_sensitive, defaultfmt, unpack, usemask, loose, invalid_raise, max_rows, encoding, like)
   2220             else:
   2221                 rows = np.array(data, dtype=[('', _) for _ in dtype_flat])
-> 2222                 output = rows.view(dtype)
   2223             # Now, process the rowmasks the same way
   2224             if usemask:

ValueError: When changing to a larger dtype, its size must be a divisor of the total size in bytes of the last axis of the array.

This occurs because genfromtxt handles dtypes by first "flattening" them (lifting nested fields to the toplevel, but also throwing away alignment info), constructing an array with that flattened dtype, and then .view()ing it with the original dtype; it is that last step that fails as the view cannot be done as the memory layout changes.

(OTOH, loadtxt directly constructs an output with the right dtype by using a recursive "row packer" that constructs a nested list/tuple with the right shape.)

I was hoping to implement a similar "flat-dtype" optimization for loadtxt, so that'll likely involve alignment-handling code on both sides.

... or can we just claim that non-contiguous dtypes are not supported by loadtxt/genfromtxt? (I expect the use cases of having to loadtxt() into an array with a specific, non-contiguous alignment to be exceedingly rare; you can always to a copy later if needed, which should have negligible cost compared to the (rather slow) loadtxt.)

NumPy/Python version information:

1.21.1 3.9.6 (default, Jun 30 2021, 10:22:16) 
[GCC 11.1.0]

The text was updated successfully, but these errors were encountered:

seberg · 2021-08-11T14:12:53Z

... or can we just claim that non-contiguous dtypes are not supported by loadtxt/genfromtxt?

I think this is fine, its super obscure. The only thing that might be nice would be non-packed structured dtypes, but I am not sure we should even worry about that too much. While useful, in practice also rare (and especially rarely used).

EDIT: Although, maybe for unflattened dtypes as in your example it would be nice (still low priority IMO).

anntzer mentioned this issue Aug 17, 2021

PERF: Rely on C-level str conversions in loadtxt for up to 2x speedup #19687

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

genfromtxt fails when a non-contiguous dtype is requested #19623

genfromtxt fails when a non-contiguous dtype is requested #19623

anntzer commented Aug 6, 2021

seberg commented Aug 11, 2021 •

edited

Loading

genfromtxt fails when a non-contiguous dtype is requested #19623

genfromtxt fails when a non-contiguous dtype is requested #19623

Comments

anntzer commented Aug 6, 2021

Reproducing code example:

Error message:

NumPy/Python version information:

seberg commented Aug 11, 2021 • edited Loading

seberg commented Aug 11, 2021 •

edited

Loading