TYP,BUG: Complete type stubs for `numpy.dtypes` #27008

jorenham · 2024-07-22T15:51:00Z

The numpy.dtype subtypes in numpy.dtypes were annotated as type aliases of dtype.
But since their constructor signatures are incompatible (the ones in numpy.dtypes accept either 0 or 1 positional-only arguments), so the numpy.dtypes stubs were incorrect.

This PR fixes this by adding specialized dtype subtypes in the numpy.dtypes stubs.

The previously missing type hints for StringDType were also added; fixing #26747 (for the most part, see the note), and superseding #26528.

Note

The StringDType annotations aren't technically valid at the moment, since numpy.dtype[str] is invalid because str is not a subtype of np.generic.
But once its scalar-type is ready (I believe @ngoldbaum was working on this), this won't be a problem anymore.

FYI, this concludes my numpy-typing PR-streak (for now), of 36 PR's in 18 days - I gotta get back to my day jobs. But since my todo-list has grown more than it shrank, I have a feeling that a couple more will follow, albeit less frequently.

ngoldbaum · 2024-07-22T20:07:12Z

Awesome! Thanks so much for working on this, I definitely wouldn't have been able to figure this out. This makes adding the scalar much less pressing than it was without this fix. Unfortunately that also means I probably don't know enough about python typing to be able to review this at more than a surface leve.

I guess as a followup we can modify the np.strings type stubs to mark that they accept StringDType arrays too?

jorenham · 2024-07-22T20:35:56Z

I guess as a followup we can modify the np.strings type stubs to mark that they accept StringDType arrays too?

Yes exactly. But without a scalar type you won't be able to use np.typing.NDArray like is currently done.

So e.g. add in numpy/_core/_strings.pyi could be rewritten as something like this:

StringArray: TypeAlias = np.ndarray[Any, np.dtypes.StringDType]
T_co: TypeAlias = StringArray | U_co

@overload
def add(x1: T_co, x2: T_co) -> StringArray: ...
@overload
def add(x1: U_co, x2: U_co) -> NDArray[np.str_]: ...
@overload
def add(x1: S_co, x2: S_co) -> NDArray[np.bytes_]: ...

jorenham · 2024-07-24T18:14:59Z

@seberg

seberg · 2024-07-25T06:54:44Z

numpy/dtypes.pyi

+
+@final
+class Int64DType(
+    _TypeCodes[L["i"], L["l"], L[7]],


This isn't right unfortunately (i.e. on windows this is q). The literal code for Int64 is l or q and the type number 7 or 9 (I guess). Type numbers are fixed for the C names, not the names here.
The mypy plugin is supposed to sort this out I think: https://github.com/numpy/numpy/blob/main/numpy/typing/mypy_plugin.py

Now, maybe that plugin is the wrong approach here, though. In principle, the built-time information we need is:

Is int32 an int or a long?

Is int64 an long or a longlong

What is the size of longdouble?

Additionally, even if inferable, one of those integers is always equivalent to both. One day, I would like to encode that by making the C named version a subclass of the bit-sized one. Or, by making it an alias only so that the type-number is an instance property only (i.e. the Int64 dtype instances can still be either long or longlong with their old type numbers, but they behave identical, so only have a single type. Without a subclass the small oddity one would have to accept is that LongLong() might not return the type-code we want).

My gut feeling now is that this would be much better than what we have. So if we have some cycles/will to clean it up, I might suggest:

Creating a new templated stub file with the crucial information (might be this file itself).

Ask @ngoldbaum to help with the meson tooling to fill in the blanks at compile time (need to use compile time constants/sizes to correctly cross-compile).

This isn't right unfortunately (i.e. on windows this is q). The literal code for Int64 is l or q and the type number 7 or 9 (I guess).
Type numbers are fixed for the C names, not the names here.

Ah ok that makes sense. The reason I was confused is because e.g. numpy.dtypes.ByteDType is a direct alias for Int8DType, instead of the other way around. But in hindsight the fact that Int8DType().char is 'b', instead of 'i1', should've been a giveaway.

The mypy plugin is supposed to sort this out I think

I don't consider the mypy plugin as a solution myself, since there are more (and imo better) fish in the typing sea, e.g. pyright.
It's not that I'm against a mypy plugin per-se, but I think that prioritizing a type-checker-agnostic solution should be our aim.

And I'm also not really sure if platform-dependent types is a good idea in general actually.
Having your type stubs depend on e.g. Python version or NumPy version is no problem I think, as these are explicitly known beforehand, and are within the control of the user.
But platform-dependent type stubs are a different story, and could lead to those "but it works on my machine" scenario's, which could lead to 3rd party libraries that are only type-valid on x86/64 win32 machines that compiled numpy with the default compilation options (unless it's the 3rd full moon of a prime-numbered year obviously).

So that's why I want to at least cover all the possible typecodes and sizes here, so that it's platform-agnostic.
But on the other hand, I don't think it's a good idea to account for all theoretically possible situations, assuming that the system is e.g. either 32 or 64 bits, that the C99 standards apply, and that byte, short, and intc are 8, 16, and 32 bits (as I was told that theoretically they can be bigger, but that in practice this almost never is the case, but correct me if I'm wrong; I don't have much C experience).

Having platform-agnostic stubs could also help developers find platform-dependent bugs more easily. Type-checkers will then be able to detect incorrect assumptions such as e.g. being able to cast a long to int32 in safe mode.

And while I think that generating stubs as build-time are definitely a better approach than the mypy-plugin one, the platform-agnostic stubs would still have my preference.

What do you think @seberg?

🤷, dunno. You will lose some round-tripping, since you will lose the ability to distinguish int64 and int32 when it comes to typecodes and chars. And that means in some cases things cannot round-trip (like np.dtype(int_dtype.char), or with the struct.pack module).

OTOH, yes, maybe it helps someone somewhere to clarify that if you write long you may get a 32bit or 64bit integer.
I suppose you can only try and see how many things get annoyed. Avoiding a plugin seems easy, but removing runtime sizes, I don't know.

jorenham · 2024-07-26T03:58:59Z

I've changed the widths and aliases so that they should match the C99 spec, given a LP32, ILP32, LLP64, or LP64 data model (https://en.cppreference.com/w/c/language/arithmetic_types).

So apparently a C int can be 16 bits 🤷🏻

seberg

Couple of comments. Would it be possible to encode that np.dtype[int8] actually is Int8DType? Although, that seems like it is probably some future steps.

Things like metadata of course can be None or any dict also. And Void dtype (in theory all, but lets ignore that), can have any result for isalignedstruct, fields, etc. of course.

numpy/dtypes.pyi

seberg · 2024-07-26T06:38:15Z

numpy/dtypes.pyi

+        "datetime64[ns]",
+        "datetime64[ps]",
+        "datetime64[fs]",
+        "datetime64[as]",


Can't the unit be part of the class? Just like the byteorder.

Also, for the byte-order shouldn't that refer to the order in the class definition? (But maybe the overloads become too long for it to be useful right now.)

Can't the unit be part of the class? Just like the byteorder.

Yea that's probably better. It's not really doing anything useful at the moment, and is mostly there for aesthetics.

Also, for the byte-order shouldn't that refer to the order in the class definition?

Perhaps it should, but the only way I can think of to make that work, would be to add a byteorder type parameter, and add 28+ overloads in __new__ (at least 2 for each unit).
But that might be a bit too messy, compared to the advantages.
... or isn't that what you mean?

That was what I meant... Anyway, since this is just about the DType classes as exposed here, let's not worry about it too much. All of this might become more relvant when the classes are properly merged...

numpy/dtypes.pyi

jorenham · 2024-07-26T14:27:19Z

Couple of comments. Would it be possible to encode that np.dtype[int8] actually is Int8DType?

Well in theory, yes; by overloading __new__ of either numpy.dtype.
But because of python/mypy#15182, this won't work in mypy, although it'll work with (at least) Pyright.

... and use a more consistent ``TypeVar`` naming scheme. Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

seberg · 2024-08-05T07:12:01Z

I'll just put this in, thanks. If I understand correctly, there is still a disassociation at least in the sense that the instances returned by np.dtype are not DType instances.
It seems not a big deal to wait with it, and that also means that the integer size issue might not matter too much in practice I guess.

jorenham · 2024-08-05T14:09:33Z

If I understand correctly, there is still a disassociation at least in the sense that the instances returned by np.dtype are not DType instances.

They are, but they're (strict) subtypes of dtype. So technically the current return type annotations for dtype.__new__ are correct, but (currently) not as specific as they could be.
And like I mentioned before in #27008 (comment), it's not difficult to "fix" that. But if we do, mypy users won't be able to benefit from that fix, at least not until python/mypy#15182 is fixed.

It seems not a big deal to wait with it, and that also means that the integer size issue might not matter too much in practice I guess.

I'm not sure what you mean here; didn't we cover all possible integer sizes here 🤔?

seberg · 2024-08-05T14:15:16Z

Yes we did, I was wondering if the approach we took here is right or not. But I don't think it matters right now at least in the sense that we won't break anyone.

DOC: Add release notes for #27008

github-actions bot added the 41 - Static typing label Jul 22, 2024

ngoldbaum mentioned this pull request Jul 23, 2024

TYP: Add type stub from StringDType #26528

Closed

ngoldbaum requested a review from seberg July 24, 2024 18:14

seberg reviewed Jul 25, 2024

View reviewed changes

TYP,BUG: Complete type stubs for numpy.dtypes

48f03da

jorenham force-pushed the typing/numpy_dtypes branch from 814b032 to 48f03da Compare July 26, 2024 03:48

jorenham requested a review from seberg July 26, 2024 03:48

seberg reviewed Jul 26, 2024

View reviewed changes

TYP: Apply review suggestions in numpy.dtypes

af6dade

... and use a more consistent ``TypeVar`` naming scheme. Co-authored-by: Sebastian Berg <sebastian@sipsolutions.net>

jorenham requested a review from seberg July 26, 2024 15:49

jorenham mentioned this pull request Jul 31, 2024

Dealing with StringDType jorenham/optype#99

Closed

seberg merged commit bf822ea into numpy:main Aug 5, 2024
66 checks passed

jorenham deleted the typing/numpy_dtypes branch August 5, 2024 14:02

jorenham mentioned this pull request Aug 10, 2024

REL: Prepare for the NumPy 2.1.0rc1 release [wheel build] #27176

Merged

jorenham added a commit to jorenham/numpy that referenced this pull request Aug 11, 2024

DOC: Add release notes for numpy#27008

7f1cd24

This was referenced Aug 11, 2024

DOC: Add release notes for #27008 #27181

Merged

TYP: dtypes.pyi is missing StringDType #26747

Closed

charris added a commit that referenced this pull request Aug 12, 2024

Merge pull request #27181 from jorenham/releasenotes-27008

1e4f654

DOC: Add release notes for #27008

bersbersbers mentioned this pull request Sep 19, 2024

TSK: Follow-up things for stringdtype #25693

Open

14 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TYP,BUG: Complete type stubs for `numpy.dtypes` #27008

TYP,BUG: Complete type stubs for `numpy.dtypes` #27008

jorenham commented Jul 22, 2024 •

edited

Loading

ngoldbaum commented Jul 22, 2024

jorenham commented Jul 22, 2024 •

edited

Loading

jorenham commented Jul 24, 2024

seberg Jul 25, 2024

jorenham Jul 25, 2024

seberg Jul 25, 2024

jorenham commented Jul 26, 2024

seberg left a comment

seberg Jul 26, 2024

jorenham Jul 26, 2024

seberg Aug 5, 2024

jorenham commented Jul 26, 2024

seberg commented Aug 5, 2024

jorenham commented Aug 5, 2024

seberg commented Aug 5, 2024

TYP,BUG: Complete type stubs for numpy.dtypes #27008

TYP,BUG: Complete type stubs for numpy.dtypes #27008

Conversation

jorenham commented Jul 22, 2024 • edited Loading

ngoldbaum commented Jul 22, 2024

jorenham commented Jul 22, 2024 • edited Loading

jorenham commented Jul 24, 2024

seberg Jul 25, 2024

Choose a reason for hiding this comment

jorenham Jul 25, 2024

Choose a reason for hiding this comment

seberg Jul 25, 2024

Choose a reason for hiding this comment

jorenham commented Jul 26, 2024

seberg left a comment

Choose a reason for hiding this comment

seberg Jul 26, 2024

Choose a reason for hiding this comment

jorenham Jul 26, 2024

Choose a reason for hiding this comment

seberg Aug 5, 2024

Choose a reason for hiding this comment

jorenham commented Jul 26, 2024

seberg commented Aug 5, 2024

jorenham commented Aug 5, 2024

seberg commented Aug 5, 2024

TYP,BUG: Complete type stubs for `numpy.dtypes` #27008

TYP,BUG: Complete type stubs for `numpy.dtypes` #27008

jorenham commented Jul 22, 2024 •

edited

Loading

jorenham commented Jul 22, 2024 •

edited

Loading