Skip to content

TYP: Make array _ShapeType bound and covariant #26081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 7, 2024
Merged

TYP: Make array _ShapeType bound and covariant #26081

merged 2 commits into from
Aug 7, 2024

Conversation

Jacob-Stevens-Haas
Copy link
Contributor

Fixes #25729. Applies to main ndarray class as well as record arrays, masked arrays, masked record arrays, mmaps, and matrices.

I didn't see anything in the dev guide about testing static types for PRs, but this change fixes the MWE in the linked issue. While there hasn't been much discussion on that issue, #16544 has renewed interest, so submitting this in case it's helpful.

@Jacob-Stevens-Haas Jacob-Stevens-Haas marked this pull request as ready for review March 19, 2024 11:59
@charris
Copy link
Member

charris commented Mar 19, 2024

@BvB93 Ping.

BvB93
BvB93 previously requested changes Mar 20, 2024
Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good change I would say, though one thing I feel is still really missing is a convenient public-facing type alias for shaped arrays (see comment below).

@Jacob-Stevens-Haas

This comment was marked as outdated.

@Jacob-Stevens-Haas

This comment was marked as outdated.

@jorenham
Copy link
Member

Thanks @Jacob-Stevens-Haas for implementing my proposal at #16544 (comment) 😄

Copy link
Member

@mtsokol mtsokol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could have a few reveal tests for the new Array type, in one of the numpy/typing/tests/data/reveal/ files (or a new test file for that), something similar to:

if sys.version_info >= (3, 11):
    Length = NewType("Length", int)
    Width = NewType("Width", int)
    arr: npt.Array[Length, Width, np.int8]
    assert_type(arr, npt.Array[int, int, np.int8])

@Jacob-Stevens-Haas
Copy link
Contributor Author

Jacob-Stevens-Haas commented Apr 4, 2024

@mtsokol

I think we could have a few reveal tests for the new Array type

Ok, so I added some tests for ndarray, and npt.Array, but I put them in test_success rather than test_reveal - I was unsure of the difference. Larger problem, the Array ones fail. The problem is that Array unpacks a TypeVarTuple into a bound TypeVar. pyright doesn't see a problem with it, but mypy does. The non-tuple equivalent:

from typing import TypeVar

T = TypeVar("T", bound=str)
S = TypeVar("S")

def foo(a: T) -> T: ...
def bar(b: S, c: S) -> T:
    return foo(b)  # type error since object is not a str

Unfortunately, pyright also accepts things like npt.Array[str, str, np.uint8]

As a result, I've reverted the tests for npt.Array (so that people can play with it if they'd like), but left in npt.Array as written. It seems like, until TypeVarTuple supports binding and variance, there's two options:
a) Change npt.Array to take a tuple shape, not a variadic one.
b) Decide the current API is desired, ship as currently written, refer bug reports to np.ndarray, and use the experience to motivate further work on PEP 646.

@jorenham
Copy link
Member

jorenham commented Apr 4, 2024

@Jacob-Stevens-Haas I agree that it would be best if there'd be upper bounds on the variadic shape type params.
But I don't see why not having them is a hard requirement. Because if you do so anyway, it should be possible for type checkers to detect that it's invalid nonetheless. And just because there are typecheckers that currently don't implement this, doesn't mean that they never will.
I even bet that once people start using numpy.typing.Array[*Shape, DType], type checkers will quickly implement such a feature. So it's a chicken & egg story, but someone has to lay the egg first 😉 .

@Jacob-Stevens-Haas

This comment was marked as outdated.

@jorenham
Copy link
Member

jorenham commented Apr 5, 2024

@Jacob-Stevens-Haas Yes, option B seems to be the most obvious to me.

In several of my projects I've been using effectively the same variadic alias, with sub-aliases for specific common ndim, e.g.

type Vector[N: int, T: np.generic] = Array[N, T]
type Matrix[M: int, N: int, T: np.generic] = Array[M, N, T]

I'm not saying you should add these here, but it's just an example of how you can add upper bound restrictions on shape type parameters, even if they are missing from Array.

@Jacob-Stevens-Haas

This comment was marked as outdated.

@Jacob-Stevens-Haas

This comment was marked as outdated.

Copy link
Member

@BvB93 BvB93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delays, got a few mostly documentation-related comments though.

@BvB93
Copy link
Member

BvB93 commented May 13, 2024

So I'm coming to believe that any form of npt.Array that uses TypeVarTuple may have to be delayed for another PR. I'm chasing around failed tests, and while things are hunky dory for 3.11 and 3.12, something always goes wrong with 3.10

Does putting the relevant code behind a if sys.version_info >= (3, 11) statement make any difference? Usually type checkers are quite good at ignoring code that's outside of such a version bound.

Copy link
Member

@ngoldbaum ngoldbaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the other build failure is unrelated to this PR. I re-triggered the build in case it was a transient issue.

There's another issue in the docs build that will need to be fixed, see below. You should be able to run the docs build locally after installing sphinx by doing "spin docs" in the root of the numpy repo.

@ngoldbaum
Copy link
Member

Ah the mac CI failure is due to an issue with spin: scientific-python/spin#193. Unrelated to this PR, please ignore it.

@Jacob-Stevens-Haas

This comment was marked as outdated.

@ngoldbaum
Copy link
Member

Yes those failures are unrelated and if you rebase or merge they'll go away. I haven't been involved with reviewing this and am not familiar with typing so I'll wait for someone else to take a final look.

@Jacob-Stevens-Haas
Copy link
Contributor Author

Jacob-Stevens-Haas commented May 16, 2024

@BvB93

Does putting the relevant code behind a if sys.version_info >= (3, 11) statement make any difference? Usually type checkers are quite good at ignoring code that's outside of such a version bound.

Eventually it did. There was one time I used * instead of Unpack in one of the tests, and fixing that made things work.

@ngoldbaum
Copy link
Member

There were three different issues that happened to break CI over the past few days and you hit two of them. Should all be good now.

@Jacob-Stevens-Haas
Copy link
Contributor Author

Jacob-Stevens-Haas commented Jul 29, 2024

Thanks Joren - good catches all around. My only discussion point is that NewType and Literal both seem like important use cases - but I agree that the details belong in the numpy.typing docs, rather than the release notes. I'll take another run at this this week.

@Jacob-Stevens-Haas
Copy link
Contributor Author

Alright, I think I've got all of the comments now. If it passes tests, it should be good to re-review. Do you prefer if reviewer or reviewee resolves old review comments? I tend to leave them open with a 👍 once I think we're on the right page in case you want to check them in re-review, but I know people differ on this.

@jorenham jorenham self-requested a review July 31, 2024 01:16
@jorenham
Copy link
Member

Do you prefer if reviewer or reviewee resolves old review comments?

I think it'll make reviewing easier if I mark them as resolved myself

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are only a couple of minor suggestions left now.

Once those are out of the way, I think that we can go ahead and merge this :)


edit:

I noticed that the commit history has gotten quite messy, and should probably be cleaned up a bit, as described in:
https://numpy.org/doc/stable/dev/development_workflow.html#rewriting-commit-history

@Jacob-Stevens-Haas
Copy link
Contributor Author

I noticed that the commit history has gotten quite messy, and should probably be cleaned up a bit

Would you be ok with me just squashing all commits together? It's not like there's several pieces put together here - just one change, copied across a few subclasses, the tests for that change, and the release note for that change.

@jorenham
Copy link
Member

jorenham commented Aug 3, 2024

I noticed that the commit history has gotten quite messy, and should probably be cleaned up a bit

Would you be ok with me just squashing all commits together? It's not like there's several pieces put together here - just one change, copied across a few subclasses, the tests for that change, and the release note for that change.

Yea that's fine by me

Fixes #25729

This change allows future changes to the static typing of numpy that modify
or only work with certain numbers of dimensions.  It also applies the change
to subclasses of ndarray and adds tests.  It allows users to statically
type their array shapes with subtypes of tuple (e.g. NamedTuple) and tuples
of int subtypes (e.g. Literal or NewType).

For a discussion of the merits of TypeVarTuple vs a tuple-bound TypeVar, see
the linked PR
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left two tiny suggestions, and a bonus one.
Once these two are fixed, we can merge this AFAIK

@jorenham jorenham dismissed BvB93’s stale review August 7, 2024 02:43

it was decided to create a separate PR for the shaped type alias

@jorenham jorenham merged commit c258f83 into numpy:main Aug 7, 2024
66 checks passed
@jorenham
Copy link
Member

jorenham commented Aug 7, 2024

Thanks a lot @Jacob-Stevens-Haas! Everyone can now go ahead and start typing the shapes of arrays 🎉.

I'm looking forward to your PR for the shaped Array type alias -- assuming that this one didn't scare you off, that is 😜.

@Jacob-Stevens-Haas
Copy link
Contributor Author

Jacob-Stevens-Haas commented Aug 7, 2024

Thanks again for your help and enthusiasm @jorenham! I'm curious whether you think it's worth working on the PEP for TypeVarTuple variance/bounds before the alias. The challenge being that we can't come back to the ideal alias in a backwards-compatible manner if we choose a non-ideal alias now. I started to look at the implementation of TypeVarTuple in CPython, and it could be an achievable goal (there's a BoundsVarianceMixin in the typing module that TypeVar uses, maybe that's compatible with TypeVarTuple?)

@jorenham
Copy link
Member

jorenham commented Aug 7, 2024

I'm curious whether you think it's worth working on the PEP for TypeVarTuple variance/bounds ...

Please do! Perhaps you could also allow ParamSpec to be co/contra-variant while you're at it 🤔; it's a very similar case.

... before the alias.

Well, besides the covariance and bounds issues, there are other reasons why I think that a plain tuple TypeVar is a better fit in this case.
A big one is that converting from a ShapeT: tuple[int, ...] to *ShapesT: int is a one-way street.

So this is not a problem:

type ArrayTup[ShapeT: tuple[int, ...], SCT: generic] = ndarray[ShapeT, dtype[SCT]]
type ArrayVar[*Shape: int, SCT: generic] = ArrayTup[tuple[*Shape], SCT]

But the other way around is impossible:

type ArrayVar[*Shape: int, SCT: generic] = ndarray[tuple[*Shape], dtype[SCT]]
type ArrayTup[ShapeT: tuple[int, ...], SCT: generic] = ArrayVar[???, SCT]

I started to look at the implementation of TypeVarTuple in CPython, and it could be an achievable goal (there's a BoundsVarianceMixin in the typing module that TypeVar uses, maybe that's compatible with TypeVarTuple?)

Don't underestimate the amount of bureaucracy and discussion that's involved in attempting to change things like this:

I once made an attempt at a similar PEP that allowed specifying co- and contra- variant type params as +T and -T (like in e.g. Scala), which got rejected (because of PEP 695 and its magic infer_variance, which turns out it horrible for debugging, and not always right). But even so, it already was a lot of work, and that was before the official typing specification existed.

I'm not trying to demotivate you or anything, but just know what you're getting yourself into.
As it currently stands, TypeVarTuple and ParamSpec are useless in most cases for which they were meant.
So it would be of great help if you could manage to revive them.
And even if you won't be able to, it's already worthwhile if you can get people to start talking about this -- it currently isn't considered to be much of a problem at all, as far as I'm aware.

@rgommers rgommers added this to the 2.1.0 release milestone Aug 7, 2024
@rgommers
Copy link
Member

rgommers commented Aug 7, 2024

Great to see this land (and just in time for 2.1.0 too), thanks a lot @Jacob-Stevens-Haas, @jorenham & all reviewers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

ENH: Should _ShapeType be covariant?
8 participants