Skip to content

ENH: allow using dtype classes in array creation functions #23404

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 23, 2023

Conversation

ngoldbaum
Copy link
Member

@ngoldbaum ngoldbaum commented Mar 16, 2023

This enables writing np.array(some_object, dtype=type(np.dtype('i'))). This is a follow-on from #23154, see that PR for more details.

I had to add a new include to ctors.h to bring in the definition of the npy_dtype_info struct. Since ctors.h is included in many other files inside numpy, I found that I needed to modify fewer includes across numpy if I moved the definition of npy_dtype_info to common.h from descriptor.h. The new includes of common.h are needed to support later includes of ctors.h in those files. If anyone has an alternate place to put npy_dtype_info that would cause less churn of includes I'd love to hear about it.

I spent a bunch of time tweaking the reference counts. I'm reasonably confident this is correct but not 100%, an additional careful pass over the reference count logic from a reviewer would be very appreciated.

I could have made _PyArray_FromAny and _PyArray_CheckFromAny take just a npy_dtype_info struct, but I found it made the reference count logic more complicated, since PyArray_FromAny and PyArray_CheckFromAny steal the reference to the descriptor they are passed and I needed to conserve that behavior. Also both functions support passing in a NULL pointer for the descriptor and I needed to maintain that behavior as well.

The change to ucsnarrow.h fixes a preexisting conflict with the prototype in ucsnarrow.c that triggered a compiler error while I was working on this.

Update 3/20/2023: Edited out some out of date stuff from the PR description.

@ngoldbaum
Copy link
Member Author

After this, PyArray_DescrConverter2 or PyArray_DescrConverter is still being exposed to python by:

  • np.empty
  • np.empty_like
  • np.zeros
  • np.fromstring
  • np.fromfile
  • np.fromiter
  • np.frombuffer
  • np.concatenate
  • np.einsum
  • np.arange
  • np.can_cast
  • np.promote_types
  • np._vec_string (used by np.char functions)
  • np.array.view
  • np.array.getfield
  • np.array.setfield
  • np.array.getarray
  • np.array.cumsum
  • np.array.cumprod
  • np.array.trace

Most of these should probably be updated to accept dtype classes as well, I think?

@seberg
Copy link
Member

seberg commented Mar 16, 2023

Yeah, some of these (e.g. .view()) do explicitly take dtype instances, most don't seem particularly interesting to me, luckily :).

@ngoldbaum ngoldbaum force-pushed the dtype-class-array-creation branch from 9576244 to 171dc85 Compare March 16, 2023 18:53
@ngoldbaum
Copy link
Member Author

Failure is due to a transient issue on anaconda.org

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, thanks. Sorry for the long N.B. I should move that to an issue probably (but will let it sink a bit more).

I would lean towards not passing newtype. Otherwise, I am worried that calling AdaptDTypeToArray is just too much at that place and as I said there, we could just skip it with two possible inclusions:

  • Fast path if the descr matches the type (unless you think that we should guarantee canonical/native result, which I don't think really, the N.B. is a bit tricky, but...).
  • If non-parametric, you could grab the singleton and use it.

Happy if you wish to drill in deeper, but this is just a fast-path in the end...

This enables writing np.array(some_object, dtype=type(np.dtype('i'))). This
is a follow-on from numpy#23154, see that PR
for more details.

I had to add a new include to `ctors.h` to bring in the definition of the
`npy_dtype_info` struct. Since `ctors.h` is included in many other files inside
numpy, I found that I needed to modify fewer includes across numpy if I moved
the definition of `npy_dtype_info` to `common.h` from `descriptor.h`. The new
includes of `common.h` are needed to support later includes of `ctors.h` in
those files. If anyone has an alternate place to put `npy_dtype_info` that would
cause less churn of includes I'd love to hear about it.

I spent a bunch of time tweaking the reference counts. I'm reasonably confident
this is correct but not 100%, an additional careful pass over the reference
count logic from a reviewer would be very appreciated.

I could have made `_PyArray_FromAny` and `_PyArray_CheckFromAny` take just a
`npy_dtype_info` struct, but I found it made the reference count logic more
complicated, since `PyArray_FromAny` and `PyArray_CheckFromAny` steal the
reference to the descriptor they are passed and I needed to conserve that
behavior. Also both functions support passing in a `NULL` pointer for the
descriptor and I needed to maintain that behavior as well.

The change to `ucsnarrow.h` fixes a preexisting conflict with the prototype
in `ucsnarrow.c` that triggered a compiler error while I was working on this.
Also make internal versions accept a dtype and a descriptor.
@ngoldbaum ngoldbaum force-pushed the dtype-class-array-creation branch from 171dc85 to 5ae51e2 Compare March 17, 2023 19:11
@ngoldbaum
Copy link
Member Author

It occurs to me that I can back out all the include changes now that npy_dtype_info is no longer needed in ctors.h. I’ll revert those changes when I’m back near a computer.

Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the update. Had one more thought, did not take a super close look, but I think it looks settled.

@ngoldbaum ngoldbaum force-pushed the dtype-class-array-creation branch from 2e2c2c1 to c45a101 Compare March 20, 2023 21:23
Copy link
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me now, thanks. Planning to put it in in soon unless anyone minds.

I think the behavior is good, the only thing might be the how we implement it exactly, and I don't think we run too deep down a dead-end even if we decide another way is better.

@seberg
Copy link
Member

seberg commented Mar 23, 2023

Time to put it in, I think. Thanks Nathan :). As I said, if anyone dislikes the implementation, happy to follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants