-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Upgrade Array API version to 2024.12 #28615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
04c491f
to
244a08d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might be out of scope, but the the 2024.12 api added a dtype
kwarg to fft.[r]fftfreq
(data-apis/array-api#885), which we currently don't have.
The __array_namespace_info__.capabilities()
function should also additionally return "max dimensions"
(https://data-apis.org/array-api/latest/API_specification/generated/array_api.info.capabilities.html)
Right, I can xfail new features, and we can consider them separately. |
22 failed, including special test cases, I guess this is due to some changes in Array API test suite itself. |
244a08d
to
7f715fb
Compare
Now should be good to review! |
if (descr == NULL) { | ||
return NULL; | ||
} | ||
return PyArray_Scalar(&count, descr, NULL); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a more significant change than it may look, since it has serious impact on promotion for count_nonzero
without an axis.
(I.e. code like arr.sum() / count_nonzero(arr)
can behave differently.)
Maybe we can do it, but we should discuss it briefly/add a release note for visibility.
But I am tempted to fix it in the array api tests to say that it is completely fine to return an integer for count_nonzero(arr, axis=None)
.
(I think an integer return is just better for NumPy users, the argument against it is only that we can also return arrays of course, for which there is no equivalent behavior obviously.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure! I can just skip this test or we can keep this change, I'm Ok with both. I added it to today's triage meeting for broader discussion. I can't attend myself today so just ping me if anything was decided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I just checked and NumPy doesn't really run into this (I suppose we don't really have code paths that never pass an axis
, so have to provision anyway).
Still think we should at least mention it in a release note as a subtle change, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I added a release note.
Right now I'm also in favor of this change - returning a NumPy scalar when axis=None
makes it coherent when axis is passed and the result is 0-d. Right now we have:
In [1]: np.count_nonzero(np.array([1,0,3,1]))
Out[1]: 3
In [2]: np.count_nonzero(np.array([1,0,3,1]), axis=0)
Out[2]: np.int64(3)
In [3]: np.count_nonzero(np.array([[1,0,3,1]]), axis=(0,1))
Out[3]: np.int64(3)
After this change it's also np.int64(3)
for axis=None
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I understand that consistency is better with the change. But in contexts where axis is always None
, the integer return is more useful and changing it can change results, because:
arr = np.linspace(0, 100, 10000, dtype=np.float32)
res = arr / np.count_nonzero(arr)
will change from being a float32
result to a float64
one.
Not that I suspect this to be seen often. skimage
has a function that will return a float64
rather than a Python float with this change for example, I am sure that usually doesn't matter.
@jorenham Right, thanks! Updated. |
3e44f97
to
87b6389
Compare
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stubs look good now 👌🏻
The mypy_primer diff shows the effect that this count_nonzero
change would have on the mypy output of 22 downstream libraries, specifically:
- https://github.com/pandas-dev/pandas
- https://github.com/optuna/optuna
- https://github.com/hydpy-dev/hydpy
- https://github.com/Toufool/AutoSplit
- https://github.com/colour-science/colour
- https://github.com/apache/spark
- https://github.com/enthought/comtypes
- https://github.com/JohannesBuchner/imagehash
- https://github.com/pandas-dev/pandas-stubs
- https://github.com/scikit-learn/scikit-learn
- https://github.com/dedupeio/dedupe
- https://github.com/google/jax
- https://github.com/yurijmikhalevich/rclip
- https://github.com/bokeh/bokeh
- https://github.com/scipy/scipy
- https://github.com/static-frame/static-frame
- https://github.com/astropenguin/xarray-dataclasses
- https://github.com/pydata/xarray
- https://github.com/scipy/scipy-stubs
- https://github.com/arviz-devs/arviz
- https://github.com/pyodide/pyodide
- https://github.com/artigraph/artigraph
This change will apparently affect 3 of them, although I'm not sure what conclusion to draw from that.
So just to be clear: My ✅ only applies to the typing side of things, as I'm not sure how to judge this int
vs intp
dance-off.
One other small deviation in the 2024.12 array api spec:
Don't know if numpy wants to follow the spec? Either way, data-apis/array-api-compat#317 adds a workaround to |
In this case, it seems OK to just allow the In general, I would love to clarify if others consider it now OK if array-api-compat diverges indefinitely from NumPy or not. |
FWIW In [60]: a = np.arange(16).reshape((4,4))
In [61]: np.take_along_axis(a, np.array([[1, 2]]), -1)
Out[61]:
array([[ 1, 2],
[ 5, 6],
[ 9, 10],
[13, 14]]) |
87b6389
to
1ceb3dc
Compare
This comment has been minimized.
This comment has been minimized.
@@ -0,0 +1,2 @@ | |||
* NumPy's ``__array_api_version__`` was upgraded from ``2023.12`` to ``2024.12``. | |||
* `numpy.count_nonzero` for ``axis=None`` now returns a scalar instead of a Python integer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* `numpy.count_nonzero` for ``axis=None`` now returns a scalar instead of a Python integer. | |
* `numpy.count_nonzero` for ``axis=None`` (default) now returns a NumPy scalar instead of a Python integer. |
Anyway it's probably OK, but larger than it looks. So I would like someone else to sign off on this subtle change before merging it. Maybe @mhvk?
(I don't want to argue that the old thing is better, just the very subtle change that I can see being useful in practice, unfortunately.)
(I guess the typing diff shows similar places to skimage, where the result might be subtly slightly worse.)
EDIT: See also data-apis/array-api#932
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the release note tweak.
Re: |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think on balance it is fine to stick to the array API and return a numpy scalar for count_nonzero
. I certainly can see how it is nice that the resulting dtype is not going to depend on what axis
is.
I think we might as well adjust take_along_axis
to get a default axis=-1
(with a version changed label).
I agree, but I can imagine that it could break a lot of code if we'd change that without announcing it beforehand. On the other hand, deprecating calls without an explicit |
Let's address/discuss So I assume we've reached an agreement to accept the |
@jorenham it just introduces a default, turning an error (nothing passed) into a success that is normally fine to just do. (I don't care, if anyone prefers no-default, I doubt anyone will care to just not ask for a default in Array API either -- it's fine to skip the default, more so there than in NumPy.) It's good (and OK) to split out these discussion either way of course! |
Great; that makes things a lot easier |
As noted by @seberg, there is no real problem with adding a default for |
This is because it is untested in array-api -tests ATM: |
That's right, I added it in the latest commit! |
This comment has been minimized.
This comment has been minimized.
9055f15
to
1364e6d
Compare
I resolved conflicts and rebased the branch. Once CI is green I'm going to merge this PR. Any objections? |
TBH, I would prefer you don't self merge but rather ask someone else to merge. Or at least give a few days notice on anything non-trivial. |
This comment has been minimized.
This comment has been minimized.
1364e6d
to
0470c15
Compare
This comment has been minimized.
This comment has been minimized.
0470c15
to
f0bc532
Compare
Diff from mypy_primer, showing the effect of this PR on type check results on a corpus of open source code: imagehash (https://github.com/JohannesBuchner/imagehash)
+ imagehash/__init__.py:112: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int") [return-value]
optuna (https://github.com/optuna/optuna)
+ optuna/study/_multi_objective.py:104: error: Incompatible types in assignment (expression has type "signedinteger[_32Bit | _64Bit]", variable has type "int | None") [assignment]
+ optuna/study/_multi_objective.py:111: error: Incompatible types in assignment (expression has type "signedinteger[_32Bit | _64Bit]", variable has type "int | None") [assignment]
+ optuna/_gp/optim_mixed.py:307: error: No overload variant of "min" matches argument types "int", "signedinteger[_32Bit | _64Bit]" [call-overload]
+ optuna/_gp/optim_mixed.py:307: note: Possible overload variants:
+ optuna/_gp/optim_mixed.py:307: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(SupportsRichComparisonT, SupportsRichComparisonT, /, *_args: SupportsRichComparisonT, key: None = ...) -> SupportsRichComparisonT
+ optuna/_gp/optim_mixed.py:307: note: def [_T] min(_T, _T, /, *_args: _T, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/_gp/optim_mixed.py:307: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any]] min(Iterable[SupportsRichComparisonT], /, *, key: None = ...) -> SupportsRichComparisonT
+ optuna/_gp/optim_mixed.py:307: note: def [_T] min(Iterable[_T], /, *, key: Callable[[_T], SupportsDunderLT[Any] | SupportsDunderGT[Any]]) -> _T
+ optuna/_gp/optim_mixed.py:307: note: def [SupportsRichComparisonT: SupportsDunderLT[Any] | SupportsDunderGT[Any], _T] min(Iterable[SupportsRichComparisonT], /, *, key: None = ..., default: _T) -> SupportsRichComparisonT | _T
+ optuna/_gp/optim_mixed.py:307: note: def [_T1, _T2] min(Iterable[_T1], /, *, key: Callable[[_T1], SupportsDunderLT[Any] | SupportsDunderGT[Any]], default: _T2) -> _T1 | _T2
spark (https://github.com/apache/spark)
+ python/pyspark/ml/linalg/__init__.py:344: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int") [return-value]
+ python/pyspark/ml/linalg/__init__.py:643: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int") [return-value]
+ python/pyspark/mllib/linalg/__init__.py:397: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int") [return-value]
+ python/pyspark/mllib/linalg/__init__.py:692: error: Incompatible return value type (got "signedinteger[_32Bit | _64Bit]", expected "int") [return-value]
|
Upgrades Array API Standard version to 2024.12 (and Array API test suite).
Let's see how many failures we get when moving to 2024.12.