Skip to content

ENH: Make the dtype objects in numpy.array_api more strict #23883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
asmeurer opened this issue Jun 5, 2023 · 4 comments · Fixed by #25370
Closed

ENH: Make the dtype objects in numpy.array_api more strict #23883

asmeurer opened this issue Jun 5, 2023 · 4 comments · Fixed by #25370

Comments

@asmeurer
Copy link
Member

asmeurer commented Jun 5, 2023

Proposed new feature or change:

In NEP 47, we decided to reuse normal numpy dtype objects in numpy.array_api. However, as we've come to use the array API standard more and learn about how it is used, we've learned that numpy.array_api is primarily useful as a strict namespace for libraries to test against to ensure they don't deviate from the standard. We've also learned that it's very difficult to mix numpy.array_api with normal numpy, and usage of actual numpy ndarrays should take a different approach.

Therefore, I think it makes more sense to use separate dtype objects in numpy.array_api, similar to how we use a distinct Array object. This would remove all dtype attributes and behaviors from the NumPy dtype objects that aren't guaranteed by the standard. That is, nothing should be implemented on the objects except for __eq__, and __eq__ itself should only compare directly against the objects themselves, not against things like dtype strings.

Note that this isn't particularly high priority, as currently any library that uses NumPy dtype specific behaviors will also detect this as soon as they test against PyTorch, whose dtype objects don't share anything in common with NumPy. But it is still a good idea in my opinion and fits with the overall goal of the module. With that being said, I don't plan to implement this for the 1.25 release.

If someone else wants to implement this, it should be straightforward:

  • Replace the current dtype objects in numpy.array_api with something like

    class Dtype:
        def __init__(self, name):
            self._name = name
            self._np_dtype = np.dtype(name)
        def __repr__(self):
            return self._name
        def __eq__(self, other):
            return isinstance(other, Dtype) and self._name == other._name
    
     int8 = Dtype('int8')
     int16 = Dtype('int16')
     ...
  • Everywhere that a dtype object is passed to an actual NumPy function in the numpy.array_api code replace dtype with dtype._np_dtype.

@asmeurer asmeurer changed the title ENH: Be more strict about dtype objects in numpy.array_api ENH: Make the dtype objects in numpy.array_api more strict Jun 5, 2023
@sohamnair
Copy link

Hello, I would like to work on this but before that would like to understand the problem a bit more in depth.

As of now we pass dtype as a parameter in the NumPy function in the numpy.array_api, so how would the updated NumPy function look like after making changes like implementing the Dtype class as mentioned above in the dtypes.py file. Can you show an example 'dtype._np_dtype' implementation in any NumPy function.

asmeurer added a commit to asmeurer/numpy that referenced this issue Dec 12, 2023
This way there is no ambiguity about the fact the non-portability of NumPy
dtype behavior, or the fact that NumPy dtypes are not necessarily allowed as
dtypes for non-NumPy array APIs.

Fixes numpy#23883
@jakevdp
Copy link
Contributor

jakevdp commented Dec 12, 2023

Commenting here after seeing #25370 sent out: does this mean libraries that want to be compatible with the array API should focus on having a separate api-compatible namespace, rather than making the primary namespace more compatible with the API? For example in the long term, should we aim to make jax.numpy conform to the Array API standard, or should we focus our efforts on jax.experimental.array_api?

@asmeurer
Copy link
Member Author

asmeurer commented Dec 12, 2023

tl;dr: it's up to you, but I would recommend aiming to make your main namespace compatible.

Some background here: our initial plan for numpy.array_api was to do exactly as you describe: use a separate namespace for the array API, because the main NumPy namespace has too many incompatibilities. However, it soon became clear that this approach wouldn't work. The reason is that many of these incompatibilities are based on the array object itself, necessitating a separate Array object. Having a separate Array object makes the numpy.array_api namespace very difficult to use as a "numpy" array API namespace. To do so, target libraries would need to intercept NumPy arrays, convert them to numpy.array_api arrays, then convert the result back to NumPy arrays at the end, because that's what the user input.

So instead, our way of thinking morphed a bit. The numpy.array_api namespace would instead become a minimal implementation of the array API. That means that it not only implements the spec exactly, it errors on anything that isn't explicitly required by the spec. For example, numpy.array_api.cos(numpy.array_api.asarray([1, 2])) raises an exception, because the spec does not require the cos function to work on integer array inputs. The spec doesn't disallow this, but the point is that any library relying on this would not be portable, because it isn't guaranteed by the spec. With this, libraries like scikit-learn can now test against numpy.array_api and if their code works, they can be pretty confident that it is array API compatible (the main caveat here is that NumPy has no non-trivial device support, but there is also cupy.array_api). However, this numpy.array_api namespace is now only useful as a testbed for libraries. It should never be used by end-users, because of its extremely restrictive nature.

For end-users, they are going to just be using normal NumPy arrays (as usual), or pytorch Tensors, or JAX arrays or whatever. So they want to be able to pass these to array API compatible functions. Since the main NumPy namespace isn't array API compatible, we created the compat library to wrap it in an array API compatible way. Unlike numpy.array_api, array_api_compat.numpy does not try to swap out the array object. This does mean a few things in the compat library are technicaly incompatible and you have to be careful about them (e.g., using a device() helper function instead of x.device because we can't add .device to numpy.ndarray). Fortunately, these are pretty minimal, thanks in large part to the decision to make the array API a mostly functional API with almost no array attributes/methods outside of magic operator methods. array-api-compat also supports pytorch, cupy, and soon (hopefully) dask and others.

The most recent news here is that NumPy has decided to make its main namespace fully array API compatible for NumPy 2.0. This will involve a few breaking changes. This work is tracked by #25076. You can also see what sorts of changes are necessary at https://numpy.org/devdocs/reference/array_api.html. In principle, once this work is completed, the compat library will no longer be needed for NumPy arrays, although it will still be needed for pytorch and other libraries, and also it has some useful helper functions like array_namespace(), meaning most packages will continue to use it.

Additionally my understanding is that numpy.array_api is going to at some point be moved out of NumPy and put into a separate package (it's currently marked as "experimental"), although I'm not sure if this is a definitive thing or not yet.

I would say that our learnings from all this is that trying to make a separate namespace is a mistake. This is especially true if it would mean a separate array object. But even if it doesn't, you might as well just make your main namespace as compatible as possible, and put any remaining wrappings that can't be done in there (e.g., because of backwards compatibility concerns) in the array-api-compat library.

@jakevdp
Copy link
Contributor

jakevdp commented Dec 12, 2023

Thanks for that context – that's very helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants