Skip to content

ENH: Upgrade Array API version to 2024.12 #28615

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

mtsokol
Copy link
Member

@mtsokol mtsokol commented Apr 1, 2025

Upgrades Array API Standard version to 2024.12 (and Array API test suite).

Let's see how many failures we get when moving to 2024.12.

@mtsokol mtsokol self-assigned this Apr 1, 2025
@jorenham jorenham changed the title ENH: Updgrade Array API version to 2024.12 ENH: Upgrade Array API version to 2024.12 Apr 1, 2025
@mtsokol mtsokol force-pushed the upgrade-array-api branch from 04c491f to 244a08d Compare April 1, 2025 13:24
Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be out of scope, but the the 2024.12 api added a dtype kwarg to fft.[r]fftfreq (data-apis/array-api#885), which we currently don't have.
The __array_namespace_info__.capabilities() function should also additionally return "max dimensions" (https://data-apis.org/array-api/latest/API_specification/generated/array_api.info.capabilities.html)

@jorenham jorenham added the 40 - array API standard PRs and issues related to support for the array API standard label Apr 1, 2025
@mtsokol
Copy link
Member Author

mtsokol commented Apr 1, 2025

Right, I can xfail new features, and we can consider them separately.

@mtsokol
Copy link
Member Author

mtsokol commented Apr 1, 2025

22 failed, including special test cases, I guess this is due to some changes in Array API test suite itself.

@mtsokol mtsokol force-pushed the upgrade-array-api branch from 244a08d to 7f715fb Compare April 1, 2025 22:05
@mtsokol mtsokol requested a review from jorenham April 1, 2025 22:39
@mtsokol
Copy link
Member Author

mtsokol commented Apr 1, 2025

Now should be good to review!

if (descr == NULL) {
return NULL;
}
return PyArray_Scalar(&count, descr, NULL);
Copy link
Member

@seberg seberg Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a more significant change than it may look, since it has serious impact on promotion for count_nonzero without an axis.
(I.e. code like arr.sum() / count_nonzero(arr) can behave differently.)

Maybe we can do it, but we should discuss it briefly/add a release note for visibility.
But I am tempted to fix it in the array api tests to say that it is completely fine to return an integer for count_nonzero(arr, axis=None).

(I think an integer return is just better for NumPy users, the argument against it is only that we can also return arrays of course, for which there is no equivalent behavior obviously.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure! I can just skip this test or we can keep this change, I'm Ok with both. I added it to today's triage meeting for broader discussion. I can't attend myself today so just ping me if anything was decided.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I just checked and NumPy doesn't really run into this (I suppose we don't really have code paths that never pass an axis, so have to provision anyway).
Still think we should at least mention it in a release note as a subtle change, though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I added a release note.

Right now I'm also in favor of this change - returning a NumPy scalar when axis=None makes it coherent when axis is passed and the result is 0-d. Right now we have:

In [1]: np.count_nonzero(np.array([1,0,3,1]))
Out[1]: 3

In [2]: np.count_nonzero(np.array([1,0,3,1]), axis=0)
Out[2]: np.int64(3)

In [3]: np.count_nonzero(np.array([[1,0,3,1]]), axis=(0,1))
Out[3]: np.int64(3)

After this change it's also np.int64(3) for axis=None.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I understand that consistency is better with the change. But in contexts where axis is always None, the integer return is more useful and changing it can change results, because:

arr = np.linspace(0, 100, 10000, dtype=np.float32)
res = arr / np.count_nonzero(arr)

will change from being a float32 result to a float64 one.

Not that I suspect this to be seen often. skimage has a function that will return a float64 rather than a Python float with this change for example, I am sure that usually doesn't matter.

jorenham

This comment was marked as resolved.

@mtsokol
Copy link
Member Author

mtsokol commented Apr 2, 2025

@jorenham Right, thanks! Updated.

@mtsokol mtsokol force-pushed the upgrade-array-api branch from 3e44f97 to 87b6389 Compare April 2, 2025 22:01

This comment has been minimized.

Copy link
Member

@jorenham jorenham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ev-br
Copy link
Contributor

ev-br commented Apr 21, 2025

One other small deviation in the 2024.12 array api spec:

Don't know if numpy wants to follow the spec? Either way, data-apis/array-api-compat#317 adds a workaround to array-api-compat.

@seberg
Copy link
Member

seberg commented Apr 22, 2025

Don't know if numpy wants to follow the spec? Either way, data-apis/array-api-compat#317 adds a workaround to array-api-compat.

In this case, it seems OK to just allow the -1 (with a ..versionchanged directive), since it matches argsort/argpartition.

In general, I would love to clarify if others consider it now OK if array-api-compat diverges indefinitely from NumPy or not.
(While I suggested creating it as a stop-gap, I think keeping it forever is completely fine. Heck, NumPy could import and return it for __array_namespace__.)

@mtsokol
Copy link
Member Author

mtsokol commented Apr 22, 2025

In this case, it seems OK to just allow the -1 (with a ..versionchanged directive), since it matches argsort/argpartition.

FWIW take_along_axis already allows -1, so if we don't want to change the default then no action is required here:

In [60]: a = np.arange(16).reshape((4,4))

In [61]: np.take_along_axis(a, np.array([[1, 2]]), -1)
Out[61]:
array([[ 1,  2],
       [ 5,  6],
       [ 9, 10],
       [13, 14]])

@mtsokol mtsokol force-pushed the upgrade-array-api branch from 87b6389 to 1ceb3dc Compare April 22, 2025 10:48

This comment has been minimized.

@@ -0,0 +1,2 @@
* NumPy's ``__array_api_version__`` was upgraded from ``2023.12`` to ``2024.12``.
* `numpy.count_nonzero` for ``axis=None`` now returns a scalar instead of a Python integer.
Copy link
Member

@seberg seberg Apr 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* `numpy.count_nonzero` for ``axis=None`` now returns a scalar instead of a Python integer.
* `numpy.count_nonzero` for ``axis=None`` (default) now returns a NumPy scalar instead of a Python integer.

Anyway it's probably OK, but larger than it looks. So I would like someone else to sign off on this subtle change before merging it. Maybe @mhvk?
(I don't want to argue that the old thing is better, just the very subtle change that I can see being useful in practice, unfortunately.)

(I guess the typing diff shows similar places to skimage, where the result might be subtly slightly worse.)

EDIT: See also data-apis/array-api#932

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the release note tweak.

@ev-br
Copy link
Contributor

ev-br commented Apr 22, 2025

Re: count_nonzero, array-api-compat carries a workaround currently, https://github.com/data-apis/array-api-compat/blob/main/array_api_compat/numpy/_aliases.py#L128.
Would be nice to drop it if numpy does make the change, so I guess it ties to #28615 (comment)

This comment has been minimized.

Copy link
Contributor

@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think on balance it is fine to stick to the array API and return a numpy scalar for count_nonzero. I certainly can see how it is nice that the resulting dtype is not going to depend on what axis is.

I think we might as well adjust take_along_axis to get a default axis=-1 (with a version changed label).

@jorenham
Copy link
Member

I think we might as well adjust take_along_axis to get a default axis=-1 (with a version changed label).

I agree, but I can imagine that it could break a lot of code if we'd change that without announcing it beforehand. On the other hand, deprecating calls without an explicit axis would also be pretty annoying.

@mtsokol
Copy link
Member Author

mtsokol commented Apr 23, 2025

Let's address/discuss take_along_axis default value separately then. For 2024.12 we have Array API test job passing in this PR (and we don't skip take_along_axis tests).

So I assume we've reached an agreement to accept the count_nonzero change. Then we can have 2024.12 support as it is here - let's merge it. Any objections?

@seberg
Copy link
Member

seberg commented Apr 23, 2025

I agree, but I can imagine that it could break a lot of code if we'd change that without announcing it beforehand

@jorenham it just introduces a default, turning an error (nothing passed) into a success that is normally fine to just do.
Not sure why it wasn't done before. Possibly due to argmax defaulting to axis=None (unlike argsort) so the take_along_axis default wouldn't pair with the argmax one, which is a small annoyance.

(I don't care, if anyone prefers no-default, I doubt anyone will care to just not ask for a default in Array API either -- it's fine to skip the default, more so there than in NumPy.)

It's good (and OK) to split out these discussion either way of course!

@jorenham
Copy link
Member

@jorenham it just introduces a default, turning an error (nothing passed) into a success that is normally fine to just do.
Not sure why it wasn't done before.

Great; that makes things a lot easier

@mhvk
Copy link
Contributor

mhvk commented Apr 23, 2025

As noted by @seberg, there is no real problem with adding a default for take_along_axis - there is none currently. So I think we might as well do it...

@ev-br
Copy link
Contributor

ev-br commented Apr 23, 2025

For 2024.12 we have Array API test job passing in this PR (and we don't skip take_along_axis tests).

This is because it is untested in array-api -tests ATM:
data-apis/array-api-tests#367 is open.

@mtsokol mtsokol added this to the 2.3.0 release milestone Apr 23, 2025
@mtsokol
Copy link
Member Author

mtsokol commented Apr 23, 2025

As noted by @seberg, there is no real problem with adding a default for take_along_axis - there is none currently. So I think we might as well do it...

That's right, I added it in the latest commit!

This comment has been minimized.

@mtsokol mtsokol force-pushed the upgrade-array-api branch from 9055f15 to 1364e6d Compare April 25, 2025 10:46
@mtsokol
Copy link
Member Author

mtsokol commented Apr 25, 2025

I resolved conflicts and rebased the branch. Once CI is green I'm going to merge this PR. Any objections?

@seberg
Copy link
Member

seberg commented Apr 25, 2025

TBH, I would prefer you don't self merge but rather ask someone else to merge. Or at least give a few days notice on anything non-trivial.
Yes, I realize that there is pretty much consensus from multiple people here, but TBH, that should make it also easy to find someone who presses the button (or approves explicitly).

This comment has been minimized.

@mtsokol mtsokol force-pushed the upgrade-array-api branch from 1364e6d to 0470c15 Compare April 25, 2025 20:25

This comment has been minimized.

@mtsokol mtsokol force-pushed the upgrade-array-api branch from 0470c15 to f0bc532 Compare April 30, 2025 09:28
Copy link

Diff from mypy_primer, showing the effect of this PR on type check results on a corpus of open source code:

imagehash (https://github.com/JohannesBuchner/imagehash)
+ imagehash/__init__.py:112: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]

optuna (https://github.com/optuna/optuna)
+ optuna/study/_multi_objective.py:104: error: Incompatible types in assignment (expression has type "signedinteger[_32Bit | _64Bit]", variable has type "int | None")  [assignment]
+ optuna/study/_multi_objective.py:111: error: Incompatible types in assignment (expression has type "signedinteger[_32Bit | _64Bit]", variable has type "int | None")  [assignment]
+ optuna/_gp/optim_mixed.py:307: error: No overload variant of "min" matches argument types "int", "signedinteger[_32Bit | _64Bit]"  [call-overload]
+ optuna/_gp/optim_mixed.py:307: note: Possible overload variants:
+ optuna/_gp/optim_mixed.py:307: note:     def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(SupportsRichComparisonT, SupportsRichComparisonT, /, *_args: SupportsRichComparisonT, key: None = ...) -> SupportsRichComparisonT
+ optuna/_gp/optim_mixed.py:307: note:     def [_T] min(_T, _T, /, *_args: _T, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/_gp/optim_mixed.py:307: note:     def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(Iterable[SupportsRichComparisonT], /, *, key: None = ...) -> SupportsRichComparisonT
+ optuna/_gp/optim_mixed.py:307: note:     def [_T] min(Iterable[_T], /, *, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/_gp/optim_mixed.py:307: note:     def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any], _T] min(Iterable[SupportsRichComparisonT], /, *, key: None = ..., default: _T) -> SupportsRichComparisonT | _T
+ optuna/_gp/optim_mixed.py:307: note:     def [_T1, _T2] min(Iterable[_T1], /, *, key: Callable[[_T1], SupportsDunderLT[Any] | SupportsDunderGT[Any]], default: _T2) -> _T1 | _T2

spark (https://github.com/apache/spark)
+ python/pyspark/ml/linalg/__init__.py:344: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]
+ python/pyspark/ml/linalg/__init__.py:643: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]
+ python/pyspark/mllib/linalg/__init__.py:397: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]
+ python/pyspark/mllib/linalg/__init__.py:692: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int")  [return-value]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
01 - Enhancement 40 - array API standard PRs and issues related to support for the array API standard
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants