-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Array API standard and Numpy compatibility #21135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We discussed this at the weekly community meeting. There are a few issues here:
As far as I can tell (1) should JustWork: converting an ndarray to an xp.Array should be seamless when doing I think it is worthwhile asking the array api team for clarifications as well. Could you open an issue on their issue tracker https://github.com/data-apis/array-api asking how they view this? |
Ok, I created data-apis/array-api#400 to keep them on the loop. IMHO, the problem is not really with the API standard, but with NumPy and its desire to conform just with the minimum implementation and use a separate namespace. I think that the minimal implementation should be moved to a separate package for array API utils (that maybe should provide also the necessary Protocols for static typing) and that NumPy should try to achieve standard compliance using The only problem I see with this approach is with |
Worth noting that some of these things have already been discussed in NEP 47. |
@asmeurer would you be able to detail this a bit more? I did not reread the NEP, but it has no example code snippets for this user-story. As far as I understood both of these questions were not quite settled when the NEP was written (I had asked the same questions at the time). To my knowledge the answer to this "user story" for library authors is currently still being explored (probably mainly as a sklearn PR, but I am not sure). |
Agreed with this. The NEP has a section Feedback from downstream library authors which is TODO and mentions trying out use cases. From the scikit-learn and SciPy PRs that are in progress we should be learning what the pain points and preferred approach is. @vnmabus's report of this friction in supporting both array objects and namespaces is discussed in scikit-learn issue #22352 in most detail.
This is starting to look like a more attractive option indeed. And it is correct that object arrays are not forbidden. Rather than "attractive" I should perhaps say "necessary" - I'm still not super enthusiastic about a standalone Python package with a compliant implementation, but there's probably two reasons to do it indeed:
Given that that standalone package would still have a dependency on a very recent NumPy version and may need to use some private NumPy APIs (that's what Also, there's the array API standard test suite, which is now being versioned with a date-based scheme. There were request for turning that into a package as well, could be combined. Related important pointOn the scikit-learn issue it was pointed out that the divergence between the If we do get there, a |
Note that the
|
I was out last week so sorry for not responding sooner. I only meant that the NEP answers the questions of why array_api is in NumPy and not a standalone package (https://numpy.org/neps/nep-0047-array-api-standard.html#alternatives), why the implementation is minimal (https://numpy.org/neps/nep-0047-array-api-standard.html#high-level-design), and the technical reasons why it is a separate namespace instead of the main namespace (https://numpy.org/neps/nep-0047-array-api-standard.html#implementation). The NEP doesn't directly have a decision for the main numpy namespace to or not to conform to the standard. I'm in agreement that NumPy should aim to have full spec compatibility in its main namespace. If you search the code of numpy/array_api for It's actually not that many things. Quite a few things, like dtype checking, are done for the sake of strictness but aren't actually required by the spec. The biggest thing is some function/keyword argument renames, but those can be added as aliases without breaking compatibility. The list of things that require a compatibility break are small, with the most notable being no value-based casting, which is already being addressed. A strict implementation like By the way, there's a somewhat separate issue here, which is how users can handle receiving a numpy.ndarray, converting that into a numpy.array_api.Array, doing computation on it, and then converting it back to numpy.ndarray before returning. This is something that we are discussing how to do better in the standard itself. |
Thanks for looking at that: having a single, simple code path for library authors to handle numpy arrays alongside "foreign arrays" would be good progress. |
Just to be clear, it's not so much about "foreign arrays". Mixing two different libraries isn't something that the consortium has discussed much. It's about how to handle libraries like NumPy have a "main" namespace and a separate "array API compatible" namespace. Right now the recommended |
Understood about the mixing. What I meant was that currently you have to do something different for NumPy arrays than for Array-API-compatible arrays; I just want to see that gap be reduced so that the same code patterns apply to both. |
My 2¢: I think it would be great if you could do that. A good way may be to create a single "almost compatible" module (i.e. we ignore any incompatibilities of If that is the case, than a module may actually be good enough to read! And will also be a perfect start for further summarizing only the more difficult parts (if those even exist – aside from value-based promotion). Why? For two reasons:
About the "promotion" problem: This thread for example mentions it also. I am very sure there was some discussion about it before in an issue w.r.t to NEP 37; but I can't find it ☹ (e.g. having I think the promotion use-case is important and should not be forgotten. But it is not related to this discussion and not urgent. (This could be very important for Dask, since Dask tries to work well with both NumPy arrays and cupy in |
I'm working on this. Ralf suggested adding it in the NumPy documentation for One question I came across is the question of scalars vs. 0-D arrays. The spec only has 0-D arrays, not scalars. But the question is, assuming NumPy fixes type promotion on scalars so that they promote the same way as 0-D arrays, do you know of any other incompatibilities between them that are relevant for the spec? If there aren't any, I think NumPy can be compatible with its current behavior just by pretending that scalars are 0-D arrays. They print differently and their Python |
IMHO, there should be no scalars, only 0-D arrays, period. Otherwise, even if they behave almost identically, they break T = TypeVar("T", bound=ArrayProtocolYetToDetermine)
def my_sum(array: T) -> T:
xp = get_namespace(array)
return xp.sum(array) as this wouldn't be true for NumPy arrays. |
There is a discussion about having a typing protocol for array objects in the spec data-apis/array-api#229. As far as I understand, a NumPy scalar would pass this protocol, because it has all the same attributes as I agree in principle that just having 0-D arrays is better than having scalars, and the consortium agrees too, which is why the spec only includes 0-D arrays. But removing scalars from NumPy would be a very difficult task and it would be much simpler if NumPy could be spec compliant without actually having to do that. But again, I might be missing some other incompatibility with them, which is what I'd like to determine. |
In the above code the |
Another small inconvenience on falling back to
I don't know an easy way to please both NumPy and the standard array API. |
There is now a document in the documentation that enumerates all the differences between The most important things to consider here are the breaking changes, although thinking about how to do the compatible changes (most of which would just be name aliases) is also useful. |
I opened gh-22021 to address that. We want to be able to use |
Proposed new feature or change:
I was testing how to add support for the array API standard to a small project of mine. However, I also wanted to remain compatible with Numpy
ndarray
, as it is what everyone uses right now. However, the differences in the API between the standard andndarray
, and the decision to conform to the minimal implementation of the standard, make really difficult to support both use cases with the same code.For example, in order to create a copy of the input array, which should be a simple thing to do, using the array standard I would need to call
x = xp.asarray(x, copy=True)
. However, when I receive a numpy ndarray I fall back toxp = np
, andnp.asarray
does not have thecopy
parameter.I can't just convert the
ndarray
toArray
, both because I would be returning a different type than the input type, and becauseArray
does not allow theobject
dtype, and I explicitly allowndarray
s containingFraction
orDecimal
objects.The ideal choice would be either to make the basic Numpy
ndarray
functions compatible with the standard, or to expose an advanced version of thearray_api
that deals directly withndarray
objects and support all Numpy functionalities in a manner compatible with the standard. Otherwise, making existing code compatible with bothndarray
and the API standard will require a lot of effort and duplicated code to accommodate both.The text was updated successfully, but these errors were encountered: