DOC: Enumerate the differences between numpy and numpy.array_api #21260

asmeurer · 2022-03-29T02:39:28Z

This is a more organized presentation of the various "Note" comments in numpy/array_api.

In particular, each difference is notated as to whether it is a strictness difference (no change in NumPy needed), a compatible change (NumPy can change in a backwards compatible way), or a breaking change (NumPy would need to break backwards compatibility to match the spec).

This is related to the discussion in #21135. I'm hopeful that this can help the NumPy developers better understand exactly how NumPy would need to change in order for the top-level numpy namespace to become array API compliant.

I've also included a few minor fixes to numpy.array_api.

Note that, as outlined in the original pull request (#18585), this does not include any documentation for the array API itself. It would be a good idea to do this, but it is a harder problem, because each function would need a docstring. For now, the spec serves as the best reference for numpy.array_api.

…and vecdot

…y.array_api This is a more organized presentation of the various "Note" comments in numpy/array_api. In particular, each difference is notated as to whether it is a strictness difference (no change in NumPy needed), a compatible change (NumPy can change in a backwards compatible way), or a breaking change (NumPy would need to break backwards compatibility to match the spec).

rgommers

This looks great and is super useful, thanks @asmeurer! I have only a couple of small comments about the document.

Given that this is very useful, I recommend merging it very soon. Any discussion on making code changes to NumPy should be done elsewhere. Let's focus the review here on whether the document is correct, well-written and useful to have in the NumPy html docs.

doc/source/reference/array_api.rst

rgommers · 2022-03-29T11:46:07Z

For some reason the CI link to the artifact is broken. Here is the current rendered version: https://output.circle-artifacts.com/output/job/d6f9451b-164d-4a89-ad56-024550656ddf/artifacts/0/doc/build/html/reference/array_api.html

mattip · 2022-03-29T15:38:19Z

I've also included a few minor fixes to numpy.array_api.

Do we have any tests that fail before, pass after those changes? Do we test the array_api?

mattip · 2022-03-29T15:40:03Z

numpy/array_api/linalg.py

@@ -380,7 +384,7 @@ def vecdot(x1: Array, x2: Array, /, *, axis: int = -1) -> Array:

 # The type for ord should be Optional[Union[int, float, Literal[np.inf,
 # -np.inf]]] but Literal does not support floating-point literals.
-def vector_norm(x: Array, /, *, axis: Optional[Union[int, Tuple[int, int]]] = None, keepdims: bool = False, ord: Optional[Union[int, float]] = 2) -> Array:
+def vector_norm(x: Array, /, *, axis: Optional[Union[int, Tuple[int, ...]]] = None, keepdims: bool = False, ord: Optional[Union[int, float]] = 2) -> Array:


If a line is 81 characters and the linter complains we can let it slide. But this one is 155 characters ...

Please let's not have that conversation again, we decided this in the original PR and this change does not touch line length. This diff is fine.

… implemented)

seberg

Nice! Generally, should probably just merge this as soon as you think it is ready. It is a great overview to see that most difference are indeed small. Besides the "big" ones (promotion, scalars), most indeed seem like small changes.
(I am not too interested in the "strictness" changes, but it is probably good to list them together – in a sense we have a dual role of informing about the array_api as well as how the main namespace is incompatible to the "Array API".)

There is a quite large amount of "compatible additions" that we probably have to figure out what to do about if we want them all in the main namespace. Especially if old name need to stay around.
E.g. the asin, etc. ones I guess we need to keep the old names around for a very long time if we want to rename them. But maybe try to hide them away a bit?

(From a NumPy perspective, I would have almost liked np.bitwise.left_shift rather than np.bitwise_left_shift for example.)

seberg · 2022-03-29T19:05:39Z

doc/source/reference/array_api.rst

+     - **Breaking**
+     -
+   * - ``pinv`` has an ``rtol`` keyword argument instead of ``rcond``
+     - **Compatible**


I bet the new default is better, but it this is a breaking change since it is a change of the default value? The new name itself is a compatible change of course.

There was some discussion about this (also for matrix_rank). See data-apis/array-api#211 and data-apis/array-api#216.

This is breaking indeed, but as breaks go it's as small as they get.

Fine to mark as breaking here I'd say, pointing out that the default changes from 1e-15 to None, which means max(M, N) * eps.

I marked these as compatible because the name of the keyword argument also changed.

That doesn't matter here right? A np.linalg.pinv(x) call would start behaving differently, so that is backwards incompatible. If you were thinking that we'd keep rcond and rtol, that wouldn't quite make sense.

It is an incompatible change and it could at the very least break the reproducibility of workflows (results change and I am not sure how to transition smoothly). So it should be marked/discussed as such. In principle it has the potential to be a large change, but I doubt it is in practice.

I would expect that the number of affected users will be rather small and that it should almost never cause wrong results; so it would rather only break reproducibility. And more often than not, it hopefully will "fix" results also.
Plus, reproducibility may well already be a game of luck for those that would be affected.

Ah, you're right of course. The keyword rename can be done compatibly but the default change can't.

The same reasoning also applies matrix_rank, and to the stable flag of sort and argsort.

seberg · 2022-03-29T19:10:34Z

doc/source/reference/array_api.rst

+       the behavior with Python scalars).
+   * - No casting unsigned integer dtypes to floating dtypes (e.g., ``int64 +
+       uint64 -> float64``.
+     - **Strictness**


I suppose this is probably even in violation, rather than just "strictness"? Most promotion things in NumPy are "strictness", but this one I am not sure, since it goes from in to float?

Yeah, I was also unsure about this, but other folks here seemed to agree that the spec does not disallow any additional type promotions, including this one (CC @rgommers @kgryte).

I think it's not breaking. There is no uint64 + int64 promotion rule, which means it is undefined behavior. Libraries may decide to raise an exception, cast to float64, cast to int64, or do whatever else they like.

OK. I thought it might disallow such "weird" ones (rather than leaving it fully open). I guess that distinction would only mainly matter if there is a chance that you may want to specify contradicting promotions it in the future.

(For me it could make sense to specify that int + uint may never promote to a non-integral, but I guess NumPy is just weird here, and hopefully we may eventually deprecate or warn about this one anyway...)

I guess that distinction would only mainly matter if there is a chance that you may want to specify contradicting promotions it in the future.

That's indeed the one/main reason to specify particular exceptions or "implementing this is not allowed". We've tried to stay away from specifying exceptions or error handling type things, because it's impossible to be comprehensive (in addition for it to be difficult to align libraries' behavior here).

seberg · 2022-03-29T19:14:03Z

doc/source/reference/array_api.rst

+       arrays.
+     - **Breaking**
+     - For example, ``np.array(0., dtype=float32)*np.array(0.,
+       dtype=float64)`` is ``float32``. Note that this is value-based casting


The example is meant to use **, and maybe can just be considered a bug, since it does not use the typical promotion rules. But then, if we change promotion rules, this should just align anyway, so it hardly matters.

I can't tell what's a bug or not when it comes to value-based casting. It definitely is odd that __pow__ is the only operator that does this.

seberg · 2022-03-29T19:26:10Z

doc/source/reference/array_api.rst

+     - No cross-kind casting. No value-based casting on scalars.
+   * - ``stack`` has different default casting rules from ``np.stack``
+     - **Strictness**
+     - No cross-kind casting. No value-based casting on scalars.


Suggested change

- No cross-kind casting. No value-based casting on scalars.

- No promotion to a common dtype. No value-based casting on scalars.

I am not sure if stack will never use value-based casting currently? The function calls asarray() upfront, so I think we would most likely also not use scalar-logic in the future?

The reason is, that I think if one input is scalar, all inputs must be 0-D. For concatenation it can differ, although only for axis=None – which I am not quite sure you actually want to support.

stack and concat do allow promoting, but cross-kind promoting is unspecified ("strictness"). c.f. https://data-apis.org/array-api/latest/API_specification/generated/signatures.manipulation_functions.stack.html and https://data-apis.org/array-api/latest/API_specification/generated/signatures.manipulation_functions.concat.html

According to my comments in the code, incorrect promotion was an issue for concat. But I guess this is not so for stack (I misread the code there; stack does call result_type, but only for the error).

Sounds good, might be nice to mention the fact that it only matters for axis=None for concat, but doesn't matter too much. (Which makes this really a very niche usage.)

seberg · 2022-03-29T19:35:46Z

doc/source/reference/array_api.rst

+     - The spec does not have array scalars, only 0-D arrays. It is not clear
+       if NumPy's scalars deviate from the spec 0-D array behavior in any ways
+       other than the ones outlined in
+       :ref:`array_api-type-promotion-differences`.


They mostly do not deviate in promotion :) (although I bet there are some subtleties in power or so). Because the weird promotion rules also apply to 0-D arrays...

They are mainly immutable and hashable (if you are constrained to numeric dtypes anyway). But of course what to do with scalars may well be the next big annoying thing after the promotion issues...

Right, I wasn't clear if there were any other odd differences. Immutability shouldn't be a problem because strictly speaking the spec allows it (some libraries like Jax are completely immutable, c.f. https://data-apis.org/array-api/latest/design_topics/copies_views_and_mutation.html).

I changed this to "strictness" for now. I believe that other than type promotion (which is discussed elsewhere), scalars can duck type as 0-D arrays sufficiently for the spec. If this is ever disproved, we can update this. It's hard to know for sure without plugging them into the test suite somehow, unless you are aware of any other differences that could affect things.

seberg · 2022-03-29T19:39:28Z

doc/source/reference/array_api.rst

+     - ???
+     -
+   * - ``diagonal`` operates on the last two axes.
+     - **Breaking**


But you are moving it to linalg, which may well be a way to do the transition in NumPy. I guess in a sense it is breaking, because having np.diagonal and np.linalg.diagonal do different things may be too confusing to keep both.

I hadn't considered that. I'll leave it as "breaking" but add a note about this.

This is definitely one of the worst "breaking" cases here for NumPy. NumPy's diagonal is completely spec noncompliant, and not in a small way either, since the semantics completely differ when you have rank > 2. I don't know if the current behavior is deprecatable either.

seberg · 2022-03-29T19:39:58Z

doc/source/reference/array_api.rst

+     - The ``norm`` function has been omitted from the array API and split
+       into ``matrix_norm`` for matrix norms and ``vector_norm`` for vector
+       norms. Note that ``vector_norm`` supports any number of axes, whereas
+       ``np.norm`` only supports a single axis for vector norms.


Suggested change

``np.norm`` only supports a single axis for vector norms.

``np.linalg.norm`` only supports a single axis for vector norms.

seberg · 2022-03-29T19:43:32Z

doc/source/reference/array_api.rst

+       two axes. See `the spec definition
+       <https://data-apis.org/array-api/latest/API_specification/generated/signatures.linear_algebra_functions.matrix_transpose.html#signatures.linear_algebra_functions.matrix_transpose>`__
+   * - ``outer`` only supports 1-dimensional arrays.
+     - **Breaking**


Is this a requirement in the standard, or do you allow higher dimensions?

@kgryte and I were discussing this internally. It was his view that it is incompatible, but maybe he can comment here.

Current NumPy behavior flattens arrays with more than 1 dimension. We should consider this incompatible, especially as the consortium is likely to move forward with support for batching: data-apis/array-api#242. If 242 moves forward, NumPy's behavior would be non-compliant. And as such, I think it is appropriate here to indicate that the spec entails a breaking change. This can be revisited should the consortium move in a different direction.

Thanks, I didn't realize np.outer used raveling. np.ufunc.outer result is a.shape + b.shape I think. Adding an axis argument and changing its default might work, but then breaking is definitely right for now anyway.

seberg · 2022-03-29T19:54:22Z

doc/source/reference/array_api.rst

+     - Notes
+   * - ``sum`` and ``prod`` always upcast ``float32`` to ``float64`` when
+       ``dtype=None``.
+     - **Breaking**


I assume no matter what axis= is? This one seems almost one of the larger changes to me.

Yes, there's no restriction on axis. It's not clear to me that the axis should matter? It wasn't ever mentioned in the discussion as far as I can tell.

I faintly remember seeing the discussion and being surprised by it before, but not diving into it and thinking seriously about it. I feel this change may be something to run by a wider audience. I guess it may not be massive from a back-compat point of view since it only affects float16, float32, and complex64.

Also: I think this is at least the default float/integer precision when the input is float/integer? Or do integer inputs return floats?

The default output dtype is based on the input dtype: spec.

If the input is an integer dtype, the default output dtype is the default integer dtype provided the default integer dtype has sufficient precision.

Similarly, if the input is a float dtype, the default output dtype is the default floating-point dtype, provided the default floating-point dtype has sufficient precision.

In short, when dtype=None, the output array should never have a dtype with less precision than the input dtype.

asmeurer · 2022-03-29T21:05:03Z

doc/source/reference/array_api.rst

+   * - ``cross`` does not broadcast its arguments.
+     - ???
+     -
+   * - ``cross`` does not allow size 2 vectors (only size 3).
+     - ???
+     -


I'm still unclear whether these cross differences are a hard requirement in the spec or not. CC @rgommers @kgryte

I've opened an issue concerning linalg.cross behavior: data-apis/array-api#415. The outcome of that discussion should have bearing on potential backward compatibility concerns.

As discussed in data-apis/array-api#415, allowing size 2 vectors is likely to never be supported in the specification given difficulties such support presents in, e.g., JIT compilation (e.g., #13718). Accordingly, we should mark this NumPy behavior as backward incompatible with the spec.

Re: broadcasting. My sense is that we may be moving toward alignment on NumPy broadcasting behavior, but this is still TBD. Atm, I don't think we need to mark NumPy's current broadcasting behavior as backward incompatible.

This applies even of the name changed, because it affects the case where no keyword is passed.

charris · 2022-04-03T21:36:59Z

Discussion has quieted down. Ready to merge?

asmeurer · 2022-04-04T22:02:56Z

I'm OK with merging. There were a few open questions on some of the points (outer, matrix_rank, and cross), but I think it's fine to merge this now and update those later if necessary.

charris · 2022-04-04T23:23:34Z

OK, let's get this in. Thanks @asmeurer ,

leofang · 2022-04-07T20:11:03Z

cc: @kmaehashi @asi1024 @emcastillo

These items were not clear in the original PR numpy#21260 but have since been clarified.

asmeurer added 6 commits March 28, 2022 16:12

Update some Note comments in numpy.array_api

7ed0189

Remove some outdated comments

f306e94

Properly restrict the input dtypes for the array_api trace, svdvals, …

f375d71

…and vecdot

Add some missing notes to array_api

b5ac835

Fix a type hint in numpy.array_api

4457e37

tylerjereddy added the component: documentation label Mar 29, 2022

rgommers added the component: numpy.array_api label Mar 29, 2022

rgommers reviewed Mar 29, 2022

View reviewed changes

doc/source/reference/array_api.rst Show resolved Hide resolved

doc/source/reference/array_api.rst Outdated Show resolved Hide resolved

doc/source/reference/array_api.rst Show resolved Hide resolved

mattip reviewed Mar 29, 2022

View reviewed changes

asmeurer added 2 commits March 29, 2022 11:44

Address review comments in the array API documentation

a3624db

Add a note about the array API copy flag to reshape (which is not yet…

9bd3b6e

… implemented)

seberg reviewed Mar 29, 2022

View reviewed changes

asmeurer commented Mar 29, 2022

View reviewed changes

asmeurer added 2 commits March 29, 2022 15:05

Update review comments on the array API document

be1f91c

Note that keyword arguments with different defaults are breaking

3b38438

This applies even of the name changed, because it affects the case where no keyword is passed.

charris merged commit bebf218 into numpy:main Apr 4, 2022

charris changed the title ~~Add a document that enumerates the differences between numpy and numpy.array_api~~ DOC: Enumerates the differences between numpy and numpy.array_api Apr 4, 2022

github-actions bot added the 04 - Documentation label Apr 4, 2022

charris changed the title ~~DOC: Enumerates the differences between numpy and numpy.array_api~~ DOC: Enumerate the differences between numpy and numpy.array_api Apr 4, 2022

asmeurer added a commit to asmeurer/numpy that referenced this pull request Apr 11, 2022

Small updates to the array_api docs

14c0a35

These items were not clear in the original PR numpy#21260 but have since been clarified.

asmeurer mentioned this pull request Apr 11, 2022

Small updates to the array_api docs #21327

Merged

melissawm pushed a commit to melissawm/numpy that referenced this pull request Apr 12, 2022

Small updates to the array_api docs

51dd7aa

These items were not clear in the original PR numpy#21260 but have since been clarified.

leofang mentioned this pull request Jul 18, 2022

Update cupy.array_api cupy/cupy#6871

Merged

	- No cross-kind casting. No value-based casting on scalars.
	- No promotion to a common dtype. No value-based casting on scalars.

	``np.norm`` only supports a single axis for vector norms.
	``np.linalg.norm`` only supports a single axis for vector norms.

DOC: Enumerate the differences between numpy and numpy.array_api #21260

DOC: Enumerate the differences between numpy and numpy.array_api #21260

Conversation

asmeurer commented Mar 29, 2022

rgommers left a comment

Choose a reason for hiding this comment

rgommers commented Mar 29, 2022

mattip commented Mar 29, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kgryte Apr 7, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

seberg Mar 29, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

charris commented Apr 3, 2022

asmeurer commented Apr 4, 2022

charris commented Apr 4, 2022

leofang commented Apr 7, 2022

kgryte Apr 7, 2022 •

edited

Loading

seberg Mar 29, 2022 •

edited

Loading