-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
API: Add DType classes into new numpy.dtypes
module
#23358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7ecb74b
to
39c1cfb
Compare
Would it make sense for the
|
Just to be clear, are you suggesting that as a replacement or as additional to this? I had that originally even, but we pulled it out for I have not digged in whether making |
I think it could work as a replacement, although I like the nice explicit names you've given the dtype classes here, so I guess weakly as an addition to this. I don't know enough about numpy's typing support or python typing in general to say whether or not the concrete types would break anything, but I think eliminating the use of |
@BvB93 do you know if adapting I don't hate it as a replacement, although the current class |
OK, I think the answer for modifying The only alternative then would be if we want |
@ngoldbaum (and others), whats your take on this? How can we come to a decision whether to just do this or what alternatives to explore? |
Given the reaction you got on typing-sig, I think this PR is probably the way to go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The public API and docs of this new np.types
submodule LGTM, so +1 for those.
I did not review the C changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the later reply, been a bit busy as of late unfortunately. Do have one comment on types.pyi
though.
numpy/types.pyi
Outdated
@@ -0,0 +1,39 @@ | |||
from numpy.types import ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem here is that type checkers only see the relevant .pyi
file as it shadows the underlying .py
file; so your effectively trying to import objects from within the same modular here. Instead, the dtype types would have to be explicitly defined here:
import numpy as np
__all__: list[str]
BoolDType = np.dtype[np.bool_]
UInt64DType = np.dtype[np.uint64]
ByteDType = np.dtype[np.byte]
# etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, thanks. One day we might want additional attributes like FloatDType.finfo
which seem hard to combine with np.dtype[np.float32]
. But I guess that is a bridge to cross when we come to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that one might actually be fairly easy to do by only allowing specific dtype types for self
. There are handful of similar patterns already in the code for structured arrays for example, which are the only arrays that allow for string-based indexing.
_DTypeScalar_co = TypeVar("_DTypeScalar_co", covariant=True, bound=generic)
_NBit = TypeVar("_NBit1", bound=NBitBase)
class dtype(Generic[_DTypeScalar_co]):
@property
def finfo(self: dtype[np.inexact[_NBit]]) -> np.finfo[np.floating[_NBit]]:
05b2174
to
b439175
Compare
Hmm, I had seen this PR vaguely and thought it was all about typing... But now see it is about storing dtypes somewhere. Copying my comment:
I actually like my last suggestion, of having the dtypes defined by the abbreviations. This is similar to what we do for Of course, one could also encourage a standard pattern like Anyway, I think the name is important and |
Reading #23537 (comment), I'm a bit less sure about having |
My gut feeling aligns with Marten's:
Sure, the DTypes may be types, but that is not relevant to how we group the functionality here, and so I would also prefer |
Personally, I don't like (including internal to |
On further thought, I agree that it should not be |
The question is a bit what we want. I don't disagree with The follow-up question then would be: If I can see more dtype utility functions, for example, we are missing a way to spell |
I think these are two separate questions. The one on the submodule name ( |
My thought was: If the purpose here is to house the DType classes for niche use-cases, I am not sure putting them into a |
I suggest to forget about
That is not the sole stated purpose in the docs. From DType classes are a subset of that, so I'm personally fine with the |
The point of the original suggestion was to have a place to put types/classes that users rarely/never access. DTypes are most of those (there could be more, like I don't care about the name, the suggestion was because Python has types as a catch-all for classes that users rarely access directly. The point I wanted to make still stands: If we don't expect it to house any conveniences but only rarely used classes, I am not sure I mind the name. If we want it to house more That doesn't mean I care much either way, I don't. |
Agreed to leave out On the name: I came very much at this from the NEP write-up, so had no idea about the larger idea of But with all the work to define the data types, it seems OK to me for them to have their own space even if it is not used much -- and it may get used more than you think once it becomes easy to define user |
That's fair. At least abstract DTypes should happen pretty soon, and unlike these classes may be useful a bit more often. The first/main "create DType in Python" thing, that I currently think would be nice would be to create a custom structured DTypes in a manner similar to dataclasses. |
That is probably needed for lazy loaders, but also for pyinstaller (although for pyinstaller we could add it as a hidden module).
numpy.types
modulenumpy.dtypes
module
b439175
to
569f7bc
Compare
OK, I updated this to |
Can we move this forward? If we have it, we can use it to fix-up and put in the "drop_meatadata" function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a few nits in comments and docstrings. The module appears in the documentation on the routines.other page, and as far as I can tell no other documentation links to it. I guess that is OK for now, we can always add more links if the module is helpful.
Co-authored-by: Matti Picus <matti.picus@gmail.com>
Thanks @seberg |
This PR consists of two (initial) commits, which can be viewed independently.
Reorganize the name creating in C, because I needed the names in C later, and it seemed much easier to do it in one place. I think this is a nice cleanup (I am sure a lot more could be done).
Add
numpy.types
and add new names to it. I have pickedFloat64DType
,StrDType
,LongDoubleDType
, and camel case for the datetimes. The integers are both as C names and as integer names (this matches the scalars). The reason is that otherwise not all are available and platform dependent.The docs are just a table for now, I am sure this can be expanded.
(Probably should have a release note, because it is an addition, albeit not interesting for a wide audience probably.)
I am not set on the names, or even on
numpy.types
, it just seemed like a decent thing and the only alternative wasnumpy.dtypes
but this way we could add e.g. theArrayFunctionDispatcher
to it also.@BvB93 would this need more from a typing perspective?
EDIT: Direct link to the generated docs, which gives a clearer overview maybe https://output.circle-artifacts.com/output/job/6842ef47-ca31-498b-bee0-4f3b94e9c090/artifacts/0/doc/build/html/reference/routines.other.html#module-numpy.types