TYP: Stop using Any
as shape-type default
#27211
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This replaces the default shape-type of
numpy.ndarray
and its subtypes fromAny
totuple[int, ...]
(andtuple[int, int]
fornumpy.matrix
).This might seem trivial, since the shape-type parameter is already bound to
tuple[int, ...]
.However,
ndarray[Any, _]
is very different fromndarray[tuple[int, ...], _]
:This is clearly not what we want:
ndarray.shape
should always be a subtype oftuple[int, ...]
, but now it isAny
, which clearly isn't a subtype oftuple[int, ...]
, but the opposite, i.e. a supertype. That should never be allowed to happen, because the shapeTypeVar
is covariant and bound.And yet, it did...
... and here's why:
So
Any
behaves as the supertype of everything, but at the same time it also is the subtype of everything (likeNever
).This violates the (very) fundamental axiom that if
T
is both super- and a subtype ofS
, i.e.T <: S <: T
, thenT == S
(the analogue of thata == b
iffa <= b <= a
for integers, which follows from the Peano axioms).Currently, the
numpy.typing.NDArray
type-alias usesAny
forndarray
, but there are many more examples of this.This PR tries to avoid such unexpected
Any
behavior for shape-type defaults, by replacing it withtuple[int, ...]
, so that it matches the upper type bound (as opposed to maximally exceeding it).It turns out that this also solves a couple of typing-bugs for several functions, including
np.copy
,np.concatenate
, andnp.(v|h|)stack
. See the changed tests for details.And this isn't backwards-incompatible as you might initially think. For those that only use
npt.NDArray
in their codebase (which I'm guessing is the majority of numpy-typing users), this wouldn't even be noticed.And for those that use e.g. a "manual"
_: np.ndarray[Any, _]
, nothing changes either; because after all,Any
can be used everywhere, and it matches everything.It'll only be those few "power-users" that are already using shape-typing in their codebase, that might notice a difference (I might actually be the only one; but who knows 🤷🏻). But even so, I'm guessing it won't be a big deal for these power-users to change some annotations (i.e. remove some workarounds that aren't needed anymore).
Edit
I forgot to mention that this is the case in both mypy (1.11.1) and pyright (1.1.375).
But the typing docs are a bit vague about this, so it might not be universal type-checker behavior.
Edit 2
Probably the most important issue with
Any
as shape-type, is that it makes it impossible to write overloads for the shape. This is becausendarray[Any, DT]
will match on an overload for e.g.tuple[int, ...]
, but also fortuple[int]
. And there's no (universal) way to "filter out" theAny
using an overload.In other words, shape-typing is currently practically impossible (and attempting to do so causes very confusing type-checker behavior).
Currently, we can't even properly type something as trivial as
numpy.shape(ndarray[tuple[int]], dtype[bool]])
, and the best we can do with mypy istuple[int, ...]
. See: #27210 for the implementation.