Skip to content

ENH: let zeros, empty, and empty_like accept dtype classes #23913

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 19, 2023

Conversation

ngoldbaum
Copy link
Member

Following on from #23404, this converts np.empty, np.empty_like, and np.zeros to accept dtype classes in the dtype argument.

np.array and similar can use PyArray_AdaptDescriptorToArray to detect the appropriate dtype for a given input array by scanning input entries. However, I think functions like np.empty need a dtype instance for parametric dtypes.

For non-legacy dtypes that is trivial to implement, but I wasn't clear how to handle the legacy parametric instances, which PyArray_ExtractDTypeAndDescriptor will discard in favor of the dtype class. That means I can't just reject legacy parametric dtype classes, so I use the singleton instead. I don't know if that's the correct thing to do.

For what it's worth, np.empty(.., dtype='U') is probably pretty useless, and maybe we could deprecate it? For strings, you end up with a U1 dtype, which is likely not what people really want if they are working with other non-empty string arrays.

Suggestions on better ways to handle legacy parametric dtypes are very welcome.

@seberg
Copy link
Member

seberg commented Jun 10, 2023

Very nice, this has been on the not done list for a while :)!

For what it's worth, np.empty(.., dtype='U') is probably pretty useless, and maybe we could deprecate it?

Yes, I think ideally "U" as a dtype should refer to the class where applicable (i.e. array creation and casting, which is probably the only place its really used) and otherwise just error (rather than default to "U1"). And I would be happy about that deprecation!

I have two comments:

  • First, a concrete request: It would be nice to factor out the snippet that does the work into a small helper function.
  • Second, I am not quite sure, but I added NPY_DT_CALL_default_descr()/slots->default_descr() and one reason was that it could be useable in this context. That pushes the decision to the actual dtype, which can raise an error... downside is that that may be a bit more cryptic, although we could even chain errors. The up-side is that the U -> U1 decision is abstracted into the DType.

@ngoldbaum
Copy link
Member Author

First, a concrete request: It would be nice to factor out the snippet that does the work into a small helper function.

Done.

Second, I am not quite sure, but I added NPY_DT_CALL_default_descr()/slots->default_descr() and one reason was that it could be useable in this context. That pushes the decision to the actual dtype, which can raise an error... downside is that that may be a bit more cryptic, although we could even chain errors. The up-side is that the U -> U1 decision is abstracted into the DType.

Ah good call, I didn't think of this API hook but it makes sense to use it here. In practice it looks like all the parametric dtypes in numpy-user-dtypes have a singleton and don't error from calling default_descr, but in principle they could decide to do that now. I think leaving this up to dtype authors makes a lot of sense.

@ngoldbaum ngoldbaum force-pushed the dtype-class-more-array-creation branch from 5991ee2 to bb62384 Compare June 13, 2023 17:43
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks. Will have a quick look after the weekend, but if you wish to press on, please feel free to merge yourself (I haven't carefully audited the refcounts yet, but they look good).

/* steals the reference to dtype if it's not NULL */
ret = (PyArrayObject *)PyArray_NewLikeArrayWithShape(prototype, order, dtype,
shape.len, shape.ptr, subok);
/* steals the reference to dt_info.descr if it's not NULL */
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do prefer no stealing, but happy to leave it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also prefer not to steal the references, but PyArray_NewLikeArrayWithShape is called by PyArray_NewLikeArray, which has many consumers and I didn't want to go through all of them to make sure the references are correct.

@seberg
Copy link
Member

seberg commented Jun 14, 2023

but in principle they could decide to do that now.

Yes, I think the "new like" path shows that it should really be an error for our strings. Returning an S1 is weird, but returning the same thing as astype("S") also seems weird, since it has nothing to do with a cast IMO.

@ngoldbaum ngoldbaum force-pushed the dtype-class-more-array-creation branch from 1e2380c to fdcc67b Compare June 14, 2023 15:10
@seberg
Copy link
Member

seberg commented Jun 19, 2023

Thanks Nathan, LGTM.

@seberg seberg merged commit 1e8f440 into numpy:main Jun 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants