Skip to content

Order of generic types for ndarray #16547

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Dec 12, 2017 · 17 comments · Fixed by #17719
Closed

Order of generic types for ndarray #16547

shoyer opened this issue Dec 12, 2017 · 17 comments · Fixed by #17719

Comments

@shoyer
Copy link
Member

shoyer commented Dec 12, 2017

Per python/typing#516, it seems likely that we will have a choice between two ways to write type hints:

  1. ndarray[Dtype, Shape]
  2. ndarray[Shape, Dtype]

We'll probably have two generic arguments, but will have the freedom to choose in which order they appear. Do we have any reason to prefer one order over the other?

One weak argument in favor of [Shape, Dtype] is that it matches ndarray.__new__ (but that's so rarely used that it probably doesn't matter).

@shoyer shoyer changed the title Syntax Order of generic types for ndarray Dec 12, 2017
@rmcgibbo
Copy link
Contributor

rmcgibbo commented Dec 12, 2017

I think ndarray[Shape, Dtype] is the best way to specify the type.

My reason for preferring that the shape specification come before the dtype specification is that I anticipate that we'll probably want to have type aliases. Putting the Shape first in the syntax makes something like a array1d[Dtype] a natural shorthand, in that looks like a partially applied function, for ndarray["OneDimensional", Dtype]. That looks totally sensible. Using the other syntax, you'd have a shorthand like arrayf64[Shape] that "expands" to ndarray[Shape, float64], which seems unnatural.

Edit: based on my comment below, I'm reconsidering this position.

@rmcgibbo
Copy link
Contributor

rmcgibbo commented Dec 12, 2017

Perhaps we should quickly survey other type systems for multidimensional arrays. If there's a clear majority for one formulation or the other, that might better indicate which scheme is most natural.

  • In xtensor, the dtype is the first template parameter and the dimension is the second template parameter.
  • Eigen's tensor class puts the dtype before the rank.

@eric-wieser
Copy link
Member

Another option - allow product types to be created using &, and specify the traits separately:

x : np.Array[float] & np.Shaped[N,M]

@shoyer
Copy link
Member Author

shoyer commented Jan 22, 2018

@eric-wieser That looks like the Intersection typing proposal (python/typing#213), which would spell that like Intersection[np.Array[float], np.Shaped[N,M]]. But I doubt the extra sugar of using & for Intersection would really fly from a Python typing perspective.

@eric-wieser
Copy link
Member

eric-wieser commented Jan 22, 2018

Thanks for the reference. There's a comment about & there, but I suppose it doesn't work well with built-in types. Of course, we're free to implement it anyway for our types if we have a convincing use case and think its a good idea

@person142 person142 transferred this issue from numpy/numpy-stubs Jun 9, 2020
@wkschwartz
Copy link

One (admittedly rather weak!) argument for putting dtype before shape is the precedent of Rust, whose array declaration syntax is [T; N] where T is a type and N is an integer.

@eric-wieser
Copy link
Member

eric-wieser commented Aug 26, 2020

@wkschwartz: C++ sets the same precedent, std::array<T, N>

@seberg
Copy link
Member

seberg commented Oct 28, 2020

I have a slight tendency towards putting the dtype first as well, but no big argument to back that up.

@BvB93
Copy link
Member

BvB93 commented Oct 28, 2020

Personally I'm leaning more towards ndarray[Dtype, Shape] as well,
simply because (in most cases at least) I consider the arrays dtype as the more important descriptor.

@0az
Copy link

0az commented Oct 28, 2020

But I doubt the extra sugar of using & for Intersection would really fly from a Python typing perspective.

Thanks for the reference. There's a comment about & there, but I suppose it doesn't work well with built-in types. Of course, we're free to implement it anyway for our types if we have a convincing use case and think its a good idea.

With PEP 604, there's now precedent.

Here's another proposal in the same space: why not something like this?

dtype_and_shape: np.Array[np.int_ & (2, 2)]

@BvB93
Copy link
Member

BvB93 commented Oct 28, 2020

Here's another proposal in the same space: why not something like this?

The problem here is that static type checkers will be unable to understand the x & y syntax,
which I'd say is a deal breaker.

I do believe that work is currently being done on an Intersection type, which I suspect will (eventually?) be
followed up by an introduction of the & operator, just as PEP 604 did with | and Union.

@eric-wieser
Copy link
Member

The problem here is that static type checkers will be unable to understand the x & y syntax,
which I'd say is a deal breaker.

Do we expect static type checkers to be able to do anything with Shape anyway? My understanding was that in order for Shape to be useful, we need to either modify or write plugins for the type checkers themselves - at which point, the door is open for teaching them about & too.

@BvB93
Copy link
Member

BvB93 commented Oct 28, 2020

Do we expect static type checkers to be able to do anything with Shape anyway?

Not yet, though this is an area which is actively being worked on (search for "tensor typing" on typing-sig). The closest thing to a concrete shape type is in the Variadic generics PEP draft, which presents a TypeVar-esque object parameterized over multiple objects rather than just one.

Long story short, it would allow for a syntaxes similar to Callable:

  • ndarray[float64, [int]] # 1D
  • ndarray[float64, [int, int]] # 2D
  • ndarray[float64, [int, int, int]] # 3D

For the time being we'll have to settle for a Shape placeholder though, Any probably being the wisest due to its flexibility.

@jni
Copy link
Contributor

jni commented Oct 28, 2020

This is exciting!

For me the fact that non-typing syntax is always (?) np.ones(shape, dtype) suggests putting shape first. But if Eric's suggestion of x : np.Array[float] & np.Shaped[N,M] becomes feasible, that would be the best option imho — it's good to be able to specify them separately.

@spometun
Copy link

spometun commented Oct 29, 2020

Personally I'm leaning more towards ndarray[Dtype, Shape] as well,
simply because (in most cases at least) I consider the arrays dtype as the more important descriptor.

I feel the same, dtype first because is more descriptive. There are huge amount of possible shapes while not so many dtypes
Though I don't see any objections going wise versа.

@hameerabbasi
Copy link
Contributor

All the Array construction functions are usually of the signature (shape, dtype) such as zeros, ones, full (Excluding first argument), and so on. For this reason I prefer that spelling.

@BvB93
Copy link
Member

BvB93 commented Nov 5, 2020

Ok, the PR is up: #17719.

The signature is currently set to np.ndarray[~Shape, ~DType] after some further consideration.
The fact that this matches, for example, the ndarray constructor is the (marginally) strongest argument in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.