-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
TYP: dtypes.pyi
is missing StringDType
#26747
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Related question: what is the "scalar counterpart" of |
Related issues:
|
I tried looking at this in #26528 and unfortunately hit the issue you asked about:
In some sense there isn't one. In another sense it's python's By now we've decided this was probably a bad design and the plan now is to add a scalar that interns a single string entry. I didn't want to do that initially because it would require effectively rewriting I personally think it's a bug that the type stubs assume all dtypes must have a corresponding scalar type. That's not strictly true anymore with the new DType API. |
dtypes.pyi
is missing StringDType
dtypes.pyi
is missing StringDType
Since we discussed it, and I think there was a lot of confusion about it, let me recap a bit. DTypes indeed don't need a "scalar type" in principle, but, in practice they have one. However, there is absolutely nothing that requires that scalar type to be defined by NumPy. It is a duality: We could, and I always assumed we will eventually also add a new abstract "NumPy scalar" type, for which But, back to the issue at hand:
So the point is that So in other words: Rather than introducing a new scalar type just for the sake of fixing typing, let's maybe take a step back and see whether the thing that is fundamentally broken is not But instead that EDIT: That is:
All fail to resolve correctly. |
Not that I absolutely hate having a dedicated scalar, but I am not yet convinced that types are a serious argument for it. Not having a scalar means that the scalar cannot look like an array, which some do (although strings do a poor job of it!). I personally don't care for the notion of scalars looking too much like arrays, but I agree it will be a user surprise. For some times it would be cool to not have to deal with a second scalar type just for the sake of the DType! But, I dunno if it is for strings (because of the |
At the moment, the >>> import numpy as np
>>> np.dtypes.StringDType.type
<class 'str'> So it's currently impossible to annotate For example, consider the following statement, given some arbitrary np.dtype(dt.type).type is dt.type At first glance, this seems reasonable. In fact, from a typing perspective, this can even be proved to always hold, by considering the following facts:
So that's a case of "QED" I suppose. But however sound that logic is, >>> np.dtype(np.dtypes.StringDType.type)
dtype('<U')
>>> _.type
<class 'numpy.str_'>
>>> np.dtypes.StringDType.type
<class 'str'> Let's hope that it won't send its friends Anyway, my point with all this, is that the current Instead, I believe that |
Isn't the one critical point in those bullets that This part I challenge a bit: I don't understand why that should be paramount. Internally, NumPy does this via a mapping, and it must do so because we already have broader behavior anyway even with existing types:
All of the above are already not captured by the
Sure, |
Not really. The main issue is that
Clearly, option 2 is the path of least resistance. I really don't like exceptions (even though I'm Dutch). But currently,
That won't work: type NDArray[ST: np.generic] = np.ndarray[Any, np.dtype[ST]] So |
For what it's worth, I'm planning on working on a scalar implementation next week during the sprints at the SciPy conference. |
@ngoldbaum Good to hear! Solving this issue will be a piece of cake once there's a string scalar type :) |
@jorenham except that all of the things you are complaining about seem already wrong for You may be able to side-step that for StringDType (at the cost of other imcompatibilities possibly, since we are not unicode subclasses anymore at least if you use |
@seberg I'm afraid I don't quite follow.
What am I complaining about?
What exactly is wrong with
What makes you think that? Isn't this very issue a prime example that the community is working hard to close the gap between the numpy type annotations and the runtime?
I don't see how introducing a scalar type for
Fix what exactly? And what reasons are you talking about? |
Their EDIT: OK, fair, it actually is... But |
>>> import numpy as np
>>> np.object_
<class 'numpy.object_'>
>>> type(np.object_)
<class 'type'> 🤷🏻 But this is all rather out-of-scope for this issue. |
Yes, but it clearly is a meaningless abstract class. (It actually is a concrete class in C, just like So there are no And since that is the case, you can't possibly type something like:
or even So, you can side-step this problem with strings, by defining a concrete string type. But I don't see which problem here is unique to strings: they are shared in some form or another either by the |
@seberg I didn't know that about I understand that But I'm sure that @ngoldbaum already has a plan in mind for implementing it the |
dtypes.pyi
is missing StringDType
dtypes.pyi
is missing StringDType
Solved through #27008 |
Thanks, @jorenham ! Am I understanding #27008 (comment) correctly in that with import numpy as np
import numpy.typing as npt
arr: npt.NDArray[np.dtypes.StringDType] = np.array("abc", dtype=np.dtypes.StringDType)
arr += "def"
print(arr) Or is my type annotation not correct? I tried |
@bersbersbers |
Describe the issue:
The below code runs fine, but does not type check:
Reproduce the code example:
Error message:
Python and NumPy Versions:
2.0.0
3.12.4 (tags/v3.12.4:8e8a4ba, Jun 6 2024, 19:30:16) [MSC v.1940 64 bit (AMD64)]
Runtime Environment:
(not relevant)
Context for the issue:
No response
The text was updated successfully, but these errors were encountered: