ENH: Upgrade Array API version to 2024.12 #28615

mtsokol · 2025-04-01T13:18:28Z

Upgrades Array API Standard version to 2024.12 (and Array API test suite).

Let's see how many failures we get when moving to 2024.12.

jorenham

This might be out of scope, but the the 2024.12 api added a dtype kwarg to fft.[r]fftfreq (data-apis/array-api#885), which we currently don't have.
The __array_namespace_info__.capabilities() function should also additionally return "max dimensions" (https://data-apis.org/array-api/latest/API_specification/generated/array_api.info.capabilities.html)

mtsokol · 2025-04-01T13:36:06Z

Right, I can xfail new features, and we can consider them separately.

mtsokol · 2025-04-01T13:45:34Z

22 failed, including special test cases, I guess this is due to some changes in Array API test suite itself.

mtsokol · 2025-04-01T22:40:10Z

Now should be good to review!

seberg · 2025-04-02T07:56:03Z

numpy/_core/src/multiarray/multiarraymodule.c

+    if (descr == NULL) {
+        return NULL;
+    }
+    return PyArray_Scalar(&count, descr, NULL);


This is a more significant change than it may look, since it has serious impact on promotion for count_nonzero without an axis.
(I.e. code like arr.sum() / count_nonzero(arr) can behave differently.)

Maybe we can do it, but we should discuss it briefly/add a release note for visibility.
But I am tempted to fix it in the array api tests to say that it is completely fine to return an integer for count_nonzero(arr, axis=None).

(I think an integer return is just better for NumPy users, the argument against it is only that we can also return arrays of course, for which there is no equivalent behavior obviously.)

Sure! I can just skip this test or we can keep this change, I'm Ok with both. I added it to today's triage meeting for broader discussion. I can't attend myself today so just ping me if anything was decided.

Thanks, I just checked and NumPy doesn't really run into this (I suppose we don't really have code paths that never pass an axis, so have to provision anyway).
Still think we should at least mention it in a release note as a subtle change, though.

Sure, I added a release note.

Right now I'm also in favor of this change - returning a NumPy scalar when axis=None makes it coherent when axis is passed and the result is 0-d. Right now we have:

In [1]: np.count_nonzero(np.array([1,0,3,1])) Out[1]: 3 In [2]: np.count_nonzero(np.array([1,0,3,1]), axis=0) Out[2]: np.int64(3) In [3]: np.count_nonzero(np.array([[1,0,3,1]]), axis=(0,1)) Out[3]: np.int64(3)

After this change it's also np.int64(3) for axis=None.

Yeah, I understand that consistency is better with the change. But in contexts where axis is always None, the integer return is more useful and changing it can change results, because:

arr = np.linspace(0, 100, 10000, dtype=np.float32) res = arr / np.count_nonzero(arr)

will change from being a float32 result to a float64 one.

Not that I suspect this to be seen often. skimage has a function that will return a float64 rather than a Python float with this change for example, I am sure that usually doesn't matter.

mtsokol · 2025-04-02T20:14:09Z

@jorenham Right, thanks! Updated.

numpy/_core/numeric.pyi

jorenham

Stubs look good now 👌🏻

The mypy_primer diff shows the effect that this count_nonzero change would have on the mypy output of 22 downstream libraries, specifically:

This change will apparently affect 3 of them, although I'm not sure what conclusion to draw from that.

So just to be clear: My ✅ only applies to the typing side of things, as I'm not sure how to judge this int vs intp dance-off.

ev-br · 2025-04-21T19:04:13Z

One other small deviation in the 2024.12 array api spec:

take_along_axis specifies a default argument for axis=-1 https://data-apis.org/array-api/latest/API_specification/generated/array_api.take_along_axis.html#take-along-axis
while numpy doesn't.

Don't know if numpy wants to follow the spec? Either way, data-apis/array-api-compat#317 adds a workaround to array-api-compat.

seberg · 2025-04-22T07:03:40Z

Don't know if numpy wants to follow the spec? Either way, data-apis/array-api-compat#317 adds a workaround to array-api-compat.

In this case, it seems OK to just allow the -1 (with a ..versionchanged directive), since it matches argsort/argpartition.

In general, I would love to clarify if others consider it now OK if array-api-compat diverges indefinitely from NumPy or not.
(While I suggested creating it as a stop-gap, I think keeping it forever is completely fine. Heck, NumPy could import and return it for __array_namespace__.)

mtsokol · 2025-04-22T10:32:05Z

In this case, it seems OK to just allow the -1 (with a ..versionchanged directive), since it matches argsort/argpartition.

FWIW take_along_axis already allows -1, so if we don't want to change the default then no action is required here:

In [60]: a = np.arange(16).reshape((4,4))

In [61]: np.take_along_axis(a, np.array([[1, 2]]), -1)
Out[61]:
array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14]])

seberg · 2025-04-22T16:16:44Z

doc/release/upcoming_changes/28615.change.rst

@@ -0,0 +1,2 @@
+* NumPy's ``__array_api_version__`` was upgraded from ``2023.12`` to ``2024.12``.
+* `numpy.count_nonzero` for ``axis=None`` now returns a scalar instead of a Python integer.


Suggested change

* `numpy.count_nonzero` for ``axis=None`` now returns a scalar instead of a Python integer.

* `numpy.count_nonzero` for ``axis=None`` (default) now returns a NumPy scalar instead of a Python integer.

Anyway it's probably OK, but larger than it looks. So I would like someone else to sign off on this subtle change before merging it. Maybe @mhvk?
(I don't want to argue that the old thing is better, just the very subtle change that I can see being useful in practice, unfortunately.)

(I guess the typing diff shows similar places to skimage, where the result might be subtly slightly worse.)

EDIT: See also data-apis/array-api#932

I added the release note tweak.

ev-br · 2025-04-22T17:00:42Z

Re: count_nonzero, array-api-compat carries a workaround currently, https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/numpy/_aliases.py#L128.
Would be nice to drop it if numpy does make the change, so I guess it ties to #28615 (comment)

mhvk

I think on balance it is fine to stick to the array API and return a numpy scalar for count_nonzero. I certainly can see how it is nice that the resulting dtype is not going to depend on what axis is.

I think we might as well adjust take_along_axis to get a default axis=-1 (with a version changed label).

jorenham · 2025-04-23T17:53:13Z

I think we might as well adjust take_along_axis to get a default axis=-1 (with a version changed label).

I agree, but I can imagine that it could break a lot of code if we'd change that without announcing it beforehand. On the other hand, deprecating calls without an explicit axis would also be pretty annoying.

mtsokol · 2025-04-23T18:16:30Z

Let's address/discuss take_along_axis default value separately then. For 2024.12 we have Array API test job passing in this PR (and we don't skip take_along_axis tests).

So I assume we've reached an agreement to accept the count_nonzero change. Then we can have 2024.12 support as it is here - let's merge it. Any objections?

seberg · 2025-04-23T18:19:44Z

I agree, but I can imagine that it could break a lot of code if we'd change that without announcing it beforehand

@jorenham it just introduces a default, turning an error (nothing passed) into a success that is normally fine to just do.
Not sure why it wasn't done before. Possibly due to argmax defaulting to axis=None (unlike argsort) so the take_along_axis default wouldn't pair with the argmax one, which is a small annoyance.

(I don't care, if anyone prefers no-default, I doubt anyone will care to just not ask for a default in Array API either -- it's fine to skip the default, more so there than in NumPy.)

It's good (and OK) to split out these discussion either way of course!

jorenham · 2025-04-23T18:48:10Z

@jorenham it just introduces a default, turning an error (nothing passed) into a success that is normally fine to just do.
Not sure why it wasn't done before.

Great; that makes things a lot easier

mhvk · 2025-04-23T19:23:07Z

As noted by @seberg, there is no real problem with adding a default for take_along_axis - there is none currently. So I think we might as well do it...

ev-br · 2025-04-23T19:42:47Z

For 2024.12 we have Array API test job passing in this PR (and we don't skip take_along_axis tests).

This is because it is untested in array-api -tests ATM:
data-apis/array-api-tests#367 is open.

mtsokol · 2025-04-23T20:35:03Z

As noted by @seberg, there is no real problem with adding a default for take_along_axis - there is none currently. So I think we might as well do it...

That's right, I added it in the latest commit!

mtsokol · 2025-04-25T10:48:38Z

I resolved conflicts and rebased the branch. Once CI is green I'm going to merge this PR. Any objections?

seberg · 2025-04-25T10:55:49Z

TBH, I would prefer you don't self merge but rather ask someone else to merge. Or at least give a few days notice on anything non-trivial.
Yes, I realize that there is pretty much consensus from multiple people here, but TBH, that should make it also easy to find someone who presses the button (or approves explicitly).

github-actions · 2025-04-30T09:42:07Z

Diff from mypy_primer, showing the effect of this PR on type check results on a corpus of open source code:

imagehash (https://github.com/JohannesBuchner/imagehash)
+ imagehash/__init__.py:112: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]

optuna (https://github.com/optuna/optuna)
+ optuna/study/_multi_objective.py:104: error: Incompatible types in assignment (expression has type "signedinteger[_32Bit | _64Bit]", variable has type "int | None")  [assignment]
+ optuna/study/_multi_objective.py:111: error: Incompatible types in assignment (expression has type "signedinteger[_32Bit | _64Bit]", variable has type "int | None")  [assignment]
+ optuna/_gp/optim_mixed.py:307: error: No overload variant of "min" matches argument types "int", "signedinteger[_32Bit | _64Bit]"  [call-overload]
+ optuna/_gp/optim_mixed.py:307: note: Possible overload variants:
+ optuna/_gp/optim_mixed.py:307: note:     def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(SupportsRichComparisonT, SupportsRichComparisonT, /, *_args: SupportsRichComparisonT, key: None = ...) -> SupportsRichComparisonT
+ optuna/_gp/optim_mixed.py:307: note:     def [_T] min(_T, _T, /, *_args: _T, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/_gp/optim_mixed.py:307: note:     def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(Iterable[SupportsRichComparisonT], /, *, key: None = ...) -> SupportsRichComparisonT
+ optuna/_gp/optim_mixed.py:307: note:     def [_T] min(Iterable[_T], /, *, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/_gp/optim_mixed.py:307: note:     def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any], _T] min(Iterable[SupportsRichComparisonT], /, *, key: None = ..., default: _T) -> SupportsRichComparisonT | _T
+ optuna/_gp/optim_mixed.py:307: note:     def [_T1, _T2] min(Iterable[_T1], /, *, key: Callable[[_T1], SupportsDunderLT[Any] | SupportsDunderGT[Any]], default: _T2) -> _T1 | _T2

spark (https://github.com/apache/spark)
+ python/pyspark/ml/linalg/__init__.py:344: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]
+ python/pyspark/ml/linalg/__init__.py:643: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]
+ python/pyspark/mllib/linalg/__init__.py:397: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]
+ python/pyspark/mllib/linalg/__init__.py:692: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]

mtsokol self-assigned this Apr 1, 2025

github-actions bot added the 01 - Enhancement label Apr 1, 2025

jorenham changed the title ~~ENH: Updgrade Array API version to 2024.12~~ ENH: Upgrade Array API version to 2024.12 Apr 1, 2025

mtsokol force-pushed the upgrade-array-api branch from 04c491f to 244a08d Compare April 1, 2025 13:24

jorenham reviewed Apr 1, 2025

View reviewed changes

jorenham added the 40 - array API standard PRs and issues related to support for the array API standard label Apr 1, 2025

mtsokol force-pushed the upgrade-array-api branch from 244a08d to 7f715fb Compare April 1, 2025 22:05

mtsokol requested a review from jorenham April 1, 2025 22:39

seberg reviewed Apr 2, 2025

View reviewed changes

jorenham mentioned this pull request Apr 2, 2025

DOC: update array API standard version in compatibility page #28626

Open

This comment was marked as resolved.

Sign in to view

seberg mentioned this pull request Apr 2, 2025

Allow Python integer return for count_nonzero(arr, axis=None) data-apis/array-api#932

Open

jorenham reviewed Apr 2, 2025

View reviewed changes

numpy/_core/numeric.pyi Outdated Show resolved Hide resolved

mtsokol force-pushed the upgrade-array-api branch from 3e44f97 to 87b6389 Compare April 2, 2025 22:01