-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT: convert numpy.array_api
to array-api-strict
#28555
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The `numpy.array_api` module has been converted into a standalone package (`array-api-strict`). This new package is stable and has had a 1.0 release. The `numpy.array_api` module was marked experimental, and will be removed for NumPy 2.0. Since `array-api-strict` works with both NumPy 1.2x and NumPy 2.0, using that for testing compliance to the syntax and semantics of the array API standard should always be preferred over still trying to use `numpy.array_api`.
Re linting failures: I've installed the exact versions of |
We use pre-commit hooks, that should take care of things without a need to install any of the linters in your environment. |
I ran |
I'm not using pre-commit, mainly for security reasons, but also because it borked my |
I'd be curious about those security reasons. I've used |
I never got to the bottom of that unfortunately. It was inside a conda env, so it may have had to do with how it was installed (I don't like installing dev tools globally with something like
xref scipy/scipy#18895 for a good write-up (not from me). But tl;dr: I create my dev envs myself and like to be in control of them and understand what I'm using. I also actually browse the "to be installed" list that Anyway, let's keep this PR focused on the maintenance issue rather than pre-commit:) Thanks for addressing the linting issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rgommers! I like the PR in general but there are concurrent PR that might impact it. I am not sure which one should be finalized and merged first. See the comments below for details.
sklearn/utils/_array_api.py
Outdated
@@ -110,7 +110,7 @@ def size(x): | |||
|
|||
def _is_numpy_namespace(xp): | |||
"""Return True if xp is backed by NumPy.""" | |||
return xp.__name__ in {"numpy", "array_api_compat.numpy", "numpy.array_api"} | |||
return xp.__name__ in {"numpy", "array_api_compat.numpy"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change: _is_numpy_namespace
now really a marker for the NumPy case and array-api-strict
is now just considered like any other non-NumPy array api implementation only to be used for testing purposes and therefore not being subject to any special treatment motivated by backward-compat considerations.
I tried to sync with |
Co-authored-by: Tim Head <betatim@gmail.com>
For the record, I added the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this PR is good to go on my end. I implemented all the pending things we discussed earlier + updated the docstring slightly to better reflect the current state of affairs.
+1 for merge once #28407 is merged and resulting conflicts with this PR are resolved.
@@ -10,7 +10,7 @@ | |||
from .._config import get_config | |||
from .fixes import parse_version | |||
|
|||
_NUMPY_NAMESPACE_NAMES = {"numpy", "array_api_compat.numpy", "numpy.array_api"} | |||
_NUMPY_NAMESPACE_NAMES = {"numpy", "array_api_compat.numpy"} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I support this change, but not including array_api_strict
here is a change from how we have been treating numpy.array_api
. It might make some uses of is_numpy_namespace()
obsolete. (A good thing IMHO)
Related: do we need to keep numpy.array_api
here for people who have an older numpy version and keep using it and expect that support for something experimental continues to exist in scikit-learn? I think we shouldn't keep it in and just assume that there are ~0 people in the world who use numpy.array_api
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make some uses of
is_numpy_namespace()
obsolete.
Indeed for some cases (that we should simplify progressively) but it's still useful from time to time to detect cases where:
sklearn.get_config(array_api_dispatch=True)
and the input is a (wrapped) numpy array.
vs
sklearn.get_config(array_api_dispatch=True)
and the input is not a (wrapped) numpy array.
This is important to know when we can safely convert back and forth to numpy without overhead, for instance to use the out=
kwarg for memory efficiency reasons.
do we need to keep numpy.array_api here for people who have an older numpy version and keep using it and expect that support for something experimental continues to exist in scikit-learn?
I don't see any value in this. numpy.array_api
was a temporary experiment and it's going away. There is no point in using it in the future: it just adds complexity for no benefit.
numpy.array_api
's value was mostly for testing array api compliance, and this is now better served by array-api-strict
and it works even with older numpy versions as demonstrated by our CI.
Let me resolve the conflicts quickly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
If we feel like it, updating the doc string as I suggested would be nice. But otherwise: let's merge this thing.
Co-authored-by: Tim Head <betatim@gmail.com>
I pushed your docstring suggestion and enabled auto-merge on the PR. Thanks for the PR @rgommers and others for the reviews. |
The
numpy.array_api
module has been converted into a standalone package (array-api-strict
). This new package is stable and has had a 1.0 release. Thenumpy.array_api
module was marked experimental, and will be removed for NumPy 2.0. Sincearray-api-strict
works with both NumPy 1.2x and NumPy 2.0, using that for testing compliance to the syntax and semantics of the array API standard should always be preferred over still trying to usenumpy.array_api
.Reference Issues/PRs
No separate scikit-learn issue for this; NEP 56 documents the decision to remove
numpy.array_api
before the NumPy 2.0 releaseEDIT: numpy/numpy#25911 is the relevant NumPy PR.
What does this implement/fix? Explain your changes.
This is mostly a 1:1 replacement of
numpy.array_api
witharray_api_strict
. The improvements in the latter package did turn up a couple of places where the code wasn't compliant with the standard though, due to:device='cpu'
string which was NumPy/PyTorch-specific,numpy.ndarray
(this should be using the same array types and use thetake
function instead of fancy indexing),dtype.kind
, which isn't guaranteed to exist or return one letter numpy-style type codes,int
as a dtypeAny other comments?
I have tested with:
numpy
1.26.4array-api-strict
1.0pytorch
2.1.2 (CPU and CUDA 12.3)cupy
13.0.0 (CUDA 12.3)The only package with failures was CuPy. I'm seeing 35 failures on
main
, and 24 with the changes in this PR. Remaining CuPy failures are fromtest_array_api.py::test_nan_reductions
(20x) with these errors:model_selection/tests/test_split.py::test_array_api_train_test_split
(4x) with these errors:Failures visible on
main
that are resolved with this PR are due to:in the following tests:
utils/tests/test_array_api.py::test_weighted_sum
tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input
metrics/tests/test_common.py::test_array_api_compliance