Skip to content

ENH: Allow using dtype classes in arr.astype() #23154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 20, 2023
Merged

Conversation

seberg
Copy link
Member

@seberg seberg commented Feb 3, 2023

This enables writing arr.astype(type(np.dtype("S"))), which may seem a bit ridiculous, but is the current way that user-dtypes would be able to adapt a parametric DType based on object array data mainly and otherwise based on the array's dtype (not data).
(Yes, there may be a point in allowing to adapt to the values even when the array is not an object array initially)

This PR does not yet adept to allow np.array([...], dtype=DType_class) to do the same, but implements the necessary machinery (or a plausible way of making progress there).

There are a few commits which are roughly sorted (a deletion of the extract function is in a following commit).
The main code is:

  1. Moving the extraction logic
  2. Adding new converter functions
  3. Using them.

The final refactor of PyArray_AdaptDescriptorToArray and actual change is then rather small.


Ping @ngoldbaum since it may interest you and also it would be good to have a bit of feedback on whether this seems reasonable. The main alternative is decide on using abstract instanes more broadly (which would be closer to what we used to have but was something I tried to avoid a bit initially).

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few minor inline comments.

I also tried this using ASCIIDType and it seems work as I'd expect:

import os

os.environ["NUMPY_EXPERIMENTAL_DTYPE_API"] = "1"

import numpy as np
from asciidtype import ASCIIDType

arr = np.array(["this", "is", "an", "array"]).astype(ACIIDType)
print(repr(arr))

prints on this PR:

array(['this', 'is', 'an', 'array'], dtype=ASCIIDType(5))

Awesome!

@charris
Copy link
Member

charris commented Feb 19, 2023

@ngoldbaum Looks like this is ready.

@ngoldbaum
Copy link
Member

Agreed, sorry for not explicitly approving.

@charris charris changed the title ENH: Allow using dtype classes in arr.astype() ENH: Allow using dtype classes in arr.astype() Feb 20, 2023
@charris charris merged commit f6eaca8 into numpy:main Feb 20, 2023
@charris
Copy link
Member

charris commented Feb 20, 2023

Thanks Sebastian.

@seberg seberg deleted the astype-ref branch February 20, 2023 19:05
ngoldbaum added a commit to ngoldbaum/numpy that referenced this pull request Mar 16, 2023
This enables writing np.array(some_object, dtype=type(np.dtype('i'))). This
is a follow-on from numpy#23154, see that PR
for more details.

I had to add a new include to `ctors.h` to bring in the definition of the
`npy_dtype_info` struct. Since `ctors.h` is included in many other files inside
numpy, I found that I needed to modify fewer includes across numpy if I moved
the definition of `npy_dtype_info` to `common.h` from `descriptor.h`. The new
includes of `common.h` are needed to support later includes of `ctors.h` in
those files. If anyone has an alternate place to put `npy_dtype_info` that would
cause less churn of includes I'd love to hear about it.

I spent a bunch of time tweaking the reference counts. I'm reasonably confident
this is correct but not 100%, an additional careful pass over the reference
count logic from a reviewer would be very appreciated.

I could have made `_PyArray_FromAny` and `_PyArray_CheckFromAny` take just a
`npy_dtype_info` struct, but I found it made the reference count logic more
complicated, since `PyArray_FromAny` and `PyArray_CheckFromAny` steal the
reference to the descriptor they are passed and I needed to conserve that
behavior. Also both functions support passing in a `NULL` pointer for the
descriptor and I needed to maintain that behavior as well.

The change to `ucsnarrow.h` fixes a preexisting conflict with the prototype
in `ucsnarrow.c` that triggered a compiler error while I was working on this.
ngoldbaum added a commit to ngoldbaum/numpy that referenced this pull request Mar 17, 2023
This enables writing np.array(some_object, dtype=type(np.dtype('i'))). This
is a follow-on from numpy#23154, see that PR
for more details.

I had to add a new include to `ctors.h` to bring in the definition of the
`npy_dtype_info` struct. Since `ctors.h` is included in many other files inside
numpy, I found that I needed to modify fewer includes across numpy if I moved
the definition of `npy_dtype_info` to `common.h` from `descriptor.h`. The new
includes of `common.h` are needed to support later includes of `ctors.h` in
those files. If anyone has an alternate place to put `npy_dtype_info` that would
cause less churn of includes I'd love to hear about it.

I spent a bunch of time tweaking the reference counts. I'm reasonably confident
this is correct but not 100%, an additional careful pass over the reference
count logic from a reviewer would be very appreciated.

I could have made `_PyArray_FromAny` and `_PyArray_CheckFromAny` take just a
`npy_dtype_info` struct, but I found it made the reference count logic more
complicated, since `PyArray_FromAny` and `PyArray_CheckFromAny` steal the
reference to the descriptor they are passed and I needed to conserve that
behavior. Also both functions support passing in a `NULL` pointer for the
descriptor and I needed to maintain that behavior as well.

The change to `ucsnarrow.h` fixes a preexisting conflict with the prototype
in `ucsnarrow.c` that triggered a compiler error while I was working on this.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants