-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Overview issue: Typing regressions in NumPy 2.2 #28076
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for this analysis; it's spot on 👌🏻
Boring means predictable, and predictable means prevanteable. So by definition, regressions and bugs aren't boring 😉. |
Maybe this is type-able if we use |
I believe that all cases of unannotated code that were valid with There are two distinct shape-typingThe shape-typing mypy issues are all caused by #27211, which changed the shape-type "default" of With import numpy as np
x = np.arange(2)
x = x + 1
Since numpy 2.2.0, the return type of And Mypy only looks at So to work around this, we need to help mypy a bit by explicitly annotating import numpy as np
import numpy.typing as npt
x: npt.NDArray[np.integer] = np.arange(2)
x = x + 1
|
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
I suppose this is related and a symptom that many methods and functions in Numpy 2.2 erase shape information, giving rise to similar problems as mentioned above import numpy as np
from numpy import float64
a = np.zeros((3, 3, 3), dtype=float64)
output = [a] * 3
output[0] = a[0:1, :, :] Under mypy this results in a type error for the last line
|
The lack of shape-typing support for these functions has always been the case. Since NumPy 2.2 we made several functions, including as
Even if NumPy would have had perfect shape-typing support, your example would still be flagged as invalid, by both mypy and pyright: |
This comment builds a straw man argument. First of all, I don't claim that 2.2 introduced shape erasure, but the fact that this information is absent in 2.2 and that creates conflicts with other late changes. Second, this was a minimal example. A variable would have the same problem. But in any case lists are invariant but the inferred shape for the elements by MyPy is tuple[int, int, int] not (3,3,3). Hence the assignment is correct. If Numpy chooses now to fixe the size of the arrays in the type that would be yet another extremely unfortunate choice because having inhomogeneous tensor sizes with a similar rank is a valid application. If the dimensions must be fixed that should be declared by the user. |
And I didn't say that you did claim that 🤷🏻. It's just that I wanted to minimize the probability that someone else would misinterpret it that way (because exactly that has happened before, and it caused a lot of confusion).
No, it would be a different error code, and would be limited to mypy, whereas your example is also invalid on pyright.
That's indeed the type of If you would've assigned
Hmm I don't really understand I'm afraid 🤔. But at the moment, using |
Any additional future type information can lead to similar things. Maybe shape-typing vs. rank-typing won't happen (even ever), but when/if it does there will be code as in the example that starts failing type-checking, because the exact shape nees to now be explicitly shape erasure. If I understand correctly, this is a clear example where shape typing (not restricted to mypy) is inconvenient because the user must explicitly type less restrictive. I.e. where it fails Stéfan's rule of "untyped code should always pass". We can decide that the advantages of this are larger than the disadvantages especially long-term. But the truth is that I doubt old discussions/pushes about shape typing really took these downsides into account. So we need to be very clear about them and understand how much they affect users (compared to the long-term benefits of correct shapes). |
If I remember correctly, Stéfan's rule only applied to valid untyped code. So it doesn't apply to the example of @juanjosegarciaripoll: This is what I mean with "type-unsafe": import numpy as np
from typing import NewType
# library
Size = NewType("Size", int)
type SquareMatrix[N: Size, T: np.number] = np.ndarray[tuple[N, N], np.dtype[T]]
def list_matmul[N: Size, T: np.number](matrices: SquareMatrix[N, T], /) -> SquareMatrix[N, T]:
out = matrices[0].copy()
for matrix in matrices[1:]:
out @= matrix
return out
...
# user code
pauli = [
np.array([[0, 1], [1, 0]], np.complex128),
np.array([[0, -1j], [1j, 0]], np.complex128),
np.array([[1, 0], [0, -1]], np.complex128),
]
list_matmul(pauli) # accepted, and returns `np.eye(2) * 1j`
pauli.extend(pauli[1]) # woops, forgot the `:` <-- type checker error
list_matmul(pauli) # raises a `ValueError` Maybe it's not the best example, but the outcome is the same if you append a non-square matrix, e.g. of shape |
But I think the important thing to accept it, is that this is only type-unsafe if you assume that typing shapes is the correct/useful level of abstraction for (NumPy) array typing! And that was not the status-quo. We can argue that clearly shapes are important, just like in your example. But that doesn't mean it right or even useful for all code. |
That's a very good point. And I agree that we should've thought it through better. But I'm not sure if that would've been enough, given that it was one of those "unknown unknowns". But either way, even if we would've done everything right, then that |
Shape typing is an ongoing issue numpy/numpy#28076
I have a related question, I think, though if it should go somewhere else let me know - I tried to follow the above comments regarding float and np.floating but I'm a bit lost. What is the "right" way to handle float vs np.floating in user code? I'm running afoul of Like so, if you'll pardon the extremely short non-reproducible example to get the idea: exts = (np.min(xvals), np.max(xvals), np.min(yvals), np.max(yvals))
ax.imshow(twodeearray, extent=exts) The checker in question is basedpyright running "standard" checks. I'm not quite sure what to do about such a thing without hacking up my code... |
In the NumPy 2.2.0 release we made One of the consequences of this incorrect definition on import numpy as np
x: np.floating = np.float32()
y: np.float64 = x So you're effectively assigning a I understand that it can be frustrating to have to change a lot of your annotations. But it's not because of a regression that you have to do that. It's because your annotations were type-unsafe, and NumPy 2.2 made it possible for type-checkers to help you fix it. |
The extent: tuple[float, float, float, float] | None = ..., So it doesn't accept |
So it seems like NumPy is (and I guess always has been) actually incompatible with the built-in float? Okay ... then is asking matplotlib to change their behavior (which no I don't think is the right thing) the only option to satisfy the type-checkers? Or using cast? I'm honestly seeking advice here since the very top of this thread is talking about user documentation. |
Before NumPy 2.2, type-checkers rejected
I at runtime matplotlib accepts both |
Based on your explanation, I think the thing that is biting me the most is this, from the original post:
In particular, for me, it's edited for typo |
Yea that's understandable. I've had similar issues like that in a library I maintain that uses NumPy, so I understand how annoying it can be when you're forced to For what it's worth, we're putting a lot of work into improving the type signatures, e.g. by narrowing the return type in cases like yours. You can follow the progress at https://github.com/numpy/numtype, and you're welcome to help us out if feel like it, e.g. by raising issues or opening PR's for sub-optimally annotated functions like |
Thanks for all the info! I had run across numtype before but not realized that it was basically the future of numpy typing. I'll check it out! |
This code uses dicts as pseudo-records a lot and therefore the typing is sloppier than I would ideally like. For clarity, sm_make_map was folded into sm_make_maps and the loop unrolled. So as not to be using one dict for two radically different things, up in make_movies, the alternative threads/no-threads code paths needed to be split apart. Some of the functions in calibrate.py were incorrectly annotated in an earlier commit; this is corrected now I can see what their callers actually supply. In a few places we use the experimental shape typing from numpy 2.2; this should be removed with prejudice if it causes any problems whatsoever (see numpy/numpy#28076 ) but it does seem to work for the very limited case this code wants, i.e. “this is a 2-d matrix”.
NumPy 2.2 had a lot of typing improvements, but that also means some regressions (at least and maybe especially for mypy users).
So maybe this exercise is mainly useful to me to make sense of the mega-issue in gh-27957.
My own take-away is that we need the user documentation (gh-28077), not just for users, but also to understand better who and why people have to change their typing. That is to understand the two points:
mypy
users of unannotated code are maybe quite many.--allow-redefinition
is easy, avoidingmypy
may be more work (maybe unavoidable).scipy-lectures
is "special" or could hide generic types outside the code users see...One other thing that I would really like to see is also the "alternatives". Maybe there are none, but I would at least like to spell it out, as in:
Due to ... only thing that we might be able to avoid these regression is to hide it away as
from numpy.typing_future import ndarray
and that is impractical/impossible because...CC @jorenham although it is probably boring to you, also please feel free to amend or expand.
Issues that require user action
User issues due to (necessarily) incomplete typing
There are two things that came up where NumPy used to have less precise or wrong typing, but correcting it making it more precise (while also necessarily incomplete as it may require a new PEP) means that type checking can fail:
floating
is now used as a supertype offloat64
(rather than identity) meaning it (correctly) matchesfloat32
,float
, etc.floating
rather thanfloat64
even when they clearly returnfloat64
.np.dtype(np.floating)
gives float64, but with a warning because it is not a good meaning.)(Users could choose to use this, but probably would need to cast explicitly often.)
There is a mypy specific angle in gh-27957 to both of these, because
mypy
defaults (but always runs into it) to infer the type at the first assignment. This assignment is likely (e.g. creation) to include the correct shape and float64 type, but later re-assignments will fail.mypy
has--allow-redefinition
although it doesn't fix it fully at least for nested scopes in for-loops,mypy
may improve this.The user impact is that:
mypy
fails even for unannotated code.float64
and shape types due to imprecise NumPy type stubs. These previously passed, whether intentional or not.float64
passing previously was arguably a bug, but is still a regression.(I, @seberg, cannot tell how problematic these are, or what options we have to try to make this easier on downstream, short of reverting or including reverting.)
Simple regressions fixed or fixable in NumPy
ndarray.__setitem__
withobject_
dtype in NumPy 2.2 #27964floating
change has at least that seems very much fixable with follow-ups, see TYP: inconsistent static typing offloat64
addition #28071 (e.g.numpy.zeros(2, dtype=numpy.float64) + numpy.float64(1.0)
is clearlyfloat64
).ndarray.item
never typechecks #27977np.ndarray.tolist
return type seems broken in numpy 2.2.0 #27944np.dtype
andnp.ndarray.dtype
in numpy 2.2.0 #27945Type-checkers issues that may impact NumPy
The text was updated successfully, but these errors were encountered: