Skip to content

Empty arrays should repr valid code to recreate them #16665

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
asmeurer opened this issue Jun 22, 2020 · 9 comments
Open

Empty arrays should repr valid code to recreate them #16665

asmeurer opened this issue Jun 22, 2020 · 9 comments

Comments

@asmeurer
Copy link
Member

Reproducing code example:

Empty arrays repr like array([], shape=(0, 0), dtype=float64), but this is not actually valid because shape cannot be passed to the array constructor:

>>> from numpy import *
>>> empty((0, 0))
array([], shape=(0, 0), dtype=float64)
>>> array([], shape=(0, 0), dtype=float64)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'shape' is an invalid keyword argument for array()

It would be useful if either array([], shape=(0, 0), dtype=float64) worked or empty arrays printed to another string that actually did work, like empty(shape=(0, 0), dtype=float64). Aside from copy-pastability being useful, it's very confusing that it shows the shape keyword like this when it doesn't actually exist.

Error message:

Numpy/Python version information:

>>> np.__version__
'1.18.1'
@eric-wieser
Copy link
Member

This has been brought up before, I'll look for the old issue tomorrow

@eric-wieser
Copy link
Member

The PR was #10119

@asmeurer
Copy link
Member Author

What is the status of this? I saw the above PR was closed. This is a pretty regular annoyance for me, so I'd be happy to make a PR to fix it if we know what the fix should be. If I understand correctly, the question was how making the repr give empty() would affect array subclasses. Would adding shape as a keyword argument for array be an acceptable option? Or how about only making array.__repr__ give empty for non-subclasses?

@asmeurer
Copy link
Member Author

asmeurer commented Sep 3, 2020

I see a few ways this can be fixed:

  1. Print empty arrays using empty(shape) or `ndarray(shape). This was rejected in ENH: Print funny-shaped empty arrays using "empty" #10119 because it doesn't do the right thing for array subclasses.
  2. Actually allow the array([], shape=(0, 1)) syntax that is currently printed. If this is allowed, I see one of three ways it could work for a shape keyword argument to array:
    i. shape is only allowed for empty arrays (or maybe even just empty arrays that can't be created without it), and raises TypeError if provided otherwise.
    ii. For non-empty data, shape does a check if the data matches the given shape, raises an exception if it doesn't, and does nothing otherwise.
    iii. array(data, shape=shape) is equivalent to array(data).reshape(shape).
  3. Print empty arrays as array([]).reshape(shape).

My preference is 2, because I do think the current form that is printed is the most readable. I also think 2.iii is the best choice of the three, as I think it's what most users would expect a shape keyword argument to do, but don't feel strongly about it (obviously all three would work for the current empty array([], shape=...) cases). One example I can note is that in h5py, create_dataset(data=data, shape=shape) does the reshape (2.iii) semantics (h5py doesn't have its own recreatable repr form, so the rest of this issue is not relevant for it).

@eric-wieser
Copy link
Member

Or how about only making array.__repr__ give empty for non-subclasses?

This solution seems reasonable to me. Perhaps another option is to:

  • deprecate calling array_repr on size-zero arrays
  • Change ndarray.__repr__ to do return array_repr(self) if self.size else some_empty_repr(self)

This naturally guides subclass authors to making an appropriate decision themselves.

@rgommers
Copy link
Member

rgommers commented Sep 3, 2020

I'd be happy with (2.iii), seems like a sensible API even if not considering the need for eval(x.__repr__()) to work. The ndmin keyword also does reshaping, so I don't think there's a fundamental objection to reshaping within array().

@seberg
Copy link
Member

seberg commented Sep 4, 2020

I would be happy with 2.ii (which may actually be useful for ragged array construction, although an ndim would be simpler probably). But suggesting iii. seems like going too far, reshape is a pretty complicated function and np.array already has a lot of complexities itself.

If the input is 1-D, or the reshape is simple it is all great, but the use in being able to reshape an array using:

arr = np.arange(8).reshape(4, 2)
res = np.array(arr, shape=(2, 4))

Do we really want code that doe the above to reshape arr, or that code to work with the list: [[[1, 2], [3, 4]], [[4, 5], [5, 6]]]?

One could also argue that broadcasting semantics could be nice to make it not equivalent to reshaping, but rather to:

def array(obj, shape):
    tmp = np.array(obj)
    res = np.empty_like(tmp, shape=shape)
    res[...] = tmp

But as I said at the beginning, enforcing the shape explicitly, at least has a use-case (ragged array, and maybe just asserting shapes), which is not already covered by other good, obvious solutions, such as: np.reshape, np.broadcast_to, np.atleast_1d (although no nd version exists).

EDIT: In practice, you would have to use np.array([[]], shape=(1, 0)) and not just a single empty list, but rather all levels up to the empty one to avoid the inconsistency of making the empty list special when its an empty array.

@asmeurer
Copy link
Member Author

asmeurer commented Sep 4, 2020

Change ndarray.__repr__ to do return array_repr(self) if self.size else some_empty_repr(self)

Perhaps another reason I like option 2 is that I feel that morally, the array constructor should be able to create empty arrays. Even ignoring the un-copypastable repr, it's still annoying to me that I have to reach for a separate constructor if my array shape has a 0 in it. I feel like lots of places are forced to have 'if' conditions like that for no reason other than that array can't actually create an empty array.

@eric-wieser
Copy link
Member

eric-wieser commented Sep 5, 2020

I feel like lots of places are forced to have 'if' conditions like that for no reason other than that array can't actually create an empty array.

Can you provide an example or two of where this comes up? IMO the array constructor is useful for two things:

  • Coercing lists to arrays
  • Coercing types with more precise protocols to arrays

A shape argument would be fairly useless for the second case, and I'd argue for the first case the caller should pass something more precise than an empty list in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants