MAINT: convert `numpy.array_api` to array-api-strict #28555

rgommers · 2024-02-29T17:00:04Z

The numpy.array_api module has been converted into a standalone package (array-api-strict). This new package is stable and has had a 1.0 release. The numpy.array_api module was marked experimental, and will be removed for NumPy 2.0. Since array-api-strict works with both NumPy 1.2x and NumPy 2.0, using that for testing compliance to the syntax and semantics of the array API standard should always be preferred over still trying to use numpy.array_api.

Reference Issues/PRs

No separate scikit-learn issue for this; NEP 56 documents the decision to remove numpy.array_api before the NumPy 2.0 release

EDIT: numpy/numpy#25911 is the relevant NumPy PR.

What does this implement/fix? Explain your changes.

This is mostly a 1:1 replacement of numpy.array_api with array_api_strict. The improvements in the latter package did turn up a couple of places where the code wasn't compliant with the standard though, due to:

using a device='cpu' string which was NumPy/PyTorch-specific,
indexing non-numpy arrays with an integer numpy.ndarray (this should be using the same array types and use the take function instead of fancy indexing),
use of dtype.kind, which isn't guaranteed to exist or return one letter numpy-style type codes,
use of the builtin int as a dtype

Any other comments?

I have tested with:

numpy 1.26.4
array-api-strict 1.0
pytorch 2.1.2 (CPU and CUDA 12.3)
cupy 13.0.0 (CUDA 12.3)

The only package with failures was CuPy. I'm seeing 35 failures on main, and 24 with the changes in this PR. Remaining CuPy failures are from

test_array_api.py::test_nan_reductions (20x) with these errors:

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly

TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

model_selection/tests/test_split.py::test_array_api_train_test_split (4x) with these errors:

FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-None-cupy.array_api-None-None] - IndexError: Single-axes index [9 1 6 7 3 0 5] is a non-zero-dimensional integer array, but advanced integer indexing is not s...
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy-None-None] - ValueError: kind can only be None or 'stable'
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[True-stratify1-cupy.array_api-None-None] - ValueError: The least populated class in y has only 1 member, which is too few. The minimum number of groups for any class ca...
FAILED sklearn/model_selection/tests/test_split.py::test_array_api_train_test_split[False-None-cupy.array_api-None-None] - IndexError: Single-axes index [0 1 2 3 4 5 6] is a non-zero-dimensional integer array, but advanced integer indexing is not s...

Failures visible on main that are resolved with this PR are due to:

ValueError: Unsupported device 'cpu'

in the following tests:

utils/tests/test_array_api.py::test_weighted_sum
tests/test_common.py::test_estimators[LinearDiscriminantAnalysis()-check_array_api_input
metrics/tests/test_common.py::test_array_api_compliance

The `numpy.array_api` module has been converted into a standalone package (`array-api-strict`). This new package is stable and has had a 1.0 release. The `numpy.array_api` module was marked experimental, and will be removed for NumPy 2.0. Since `array-api-strict` works with both NumPy 1.2x and NumPy 2.0, using that for testing compliance to the syntax and semantics of the array API standard should always be preferred over still trying to use `numpy.array_api`.

github-actions · 2024-02-29T17:01:19Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: b4befe9. Link to the linter CI: here}

rgommers · 2024-03-01T08:58:27Z

Re linting failures: I've installed the exact versions of black and ruff as used in CI, and can not reproduce the linter complaint either by running the tools directly or by running ./build_tools/linting.sh. It seems unrelated to my changes, just a different way of ordering imports - but since the current ordering is also reasonable (import xxx first, then from xxx import yyy) I'll wait for a maintainer to comment before touching that.

adrinjalali · 2024-03-01T09:53:11Z

We use pre-commit hooks, that should take care of things without a need to install any of the linters in your environment.

adrinjalali · 2024-03-01T10:02:31Z

I ran pre-commit run --files path/to/those/files and then commit all the changes. Linter's happy now.

rgommers · 2024-03-01T10:03:30Z

We use pre-commit hooks, that should take care of things without a need to install any of the linters in your environment.

I'm not using pre-commit, mainly for security reasons, but also because it borked my scipy repo once when I evaluated it. So I have to rely on the bot comment instructions posted here.

adrinjalali · 2024-03-01T10:05:15Z

I'd be curious about those security reasons. I've used pre-commit here and in other projects and I've never had an issue, so curious about how it broke scipy.

adrinjalali · 2024-03-01T10:05:58Z

I'll let @betatim and @ogrisel review the array API related issues.

rgommers · 2024-03-01T10:27:06Z

so curious about how it broke scipy.

I never got to the bottom of that unfortunately. It was inside a conda env, so it may have had to do with how it was installed (I don't like installing dev tools globally with something like pipx).

I'd be curious about those security reasons.

xref scipy/scipy#18895 for a good write-up (not from me). But tl;dr: I create my dev envs myself and like to be in control of them and understand what I'm using. I also actually browse the "to be installed" list that mamba shows me before installing things. A hook on a repo that silently installs things, often from git repos of individuals, without me having control over that doesn't appeal to me. So I choose to do the bit of extra effort to install dev tools myself.

Anyway, let's keep this PR focused on the maintenance issue rather than pre-commit:) Thanks for addressing the linting issue.

ogrisel

Thanks @rgommers! I like the PR in general but there are concurrent PR that might impact it. I am not sure which one should be finalized and merged first. See the comments below for details.

ogrisel · 2024-03-03T09:15:19Z

sklearn/utils/_array_api.py

@@ -110,7 +110,7 @@ def size(x):

 def _is_numpy_namespace(xp):
    """Return True if xp is backed by NumPy."""
-    return xp.__name__ in {"numpy", "array_api_compat.numpy", "numpy.array_api"}
+    return xp.__name__ in {"numpy", "array_api_compat.numpy"}


I like this change: _is_numpy_namespace now really a marker for the NumPy case and array-api-strict is now just considered like any other non-NumPy array api implementation only to be used for testing purposes and therefore not being subject to any special treatment motivated by backward-compat considerations.

sklearn/utils/__init__.py

sklearn/model_selection/_split.py

sklearn/utils/_array_api.py

sklearn/model_selection/_split.py

sklearn/utils/_array_api.py

ogrisel · 2024-03-12T09:55:53Z

I tried to sync with main after the merge of #27904 via the GitHub API. I hope I got it right.

sklearn/linear_model/_base.py

Co-authored-by: Tim Head <betatim@gmail.com>

sklearn/linear_model/_base.py

ogrisel · 2024-03-13T10:37:59Z

For the record, I added the array-api-strict dependency to the pylatest_conda_forge_mkl_linux-64 CI config so that the changes in this PR are properly tested. This is the CI config where array-api-compat and pytorch are already installed.

…IWrapper

ogrisel

I think this PR is good to go on my end. I implemented all the pending things we discussed earlier + updated the docstring slightly to better reflect the current state of affairs.

+1 for merge once #28407 is merged and resulting conflicts with this PR are resolved.

betatim · 2024-03-14T14:53:46Z

sklearn/utils/_array_api.py

@@ -10,7 +10,7 @@
 from .._config import get_config
 from .fixes import parse_version

-_NUMPY_NAMESPACE_NAMES = {"numpy", "array_api_compat.numpy", "numpy.array_api"}
+_NUMPY_NAMESPACE_NAMES = {"numpy", "array_api_compat.numpy"}


I support this change, but not including array_api_strict here is a change from how we have been treating numpy.array_api. It might make some uses of is_numpy_namespace() obsolete. (A good thing IMHO)

Related: do we need to keep numpy.array_api here for people who have an older numpy version and keep using it and expect that support for something experimental continues to exist in scikit-learn? I think we shouldn't keep it in and just assume that there are ~0 people in the world who use numpy.array_api.

It might make some uses of is_numpy_namespace() obsolete.

Indeed for some cases (that we should simplify progressively) but it's still useful from time to time to detect cases where:

sklearn.get_config(array_api_dispatch=True) and the input is a (wrapped) numpy array.

vs

sklearn.get_config(array_api_dispatch=True) and the input is not a (wrapped) numpy array.

This is important to know when we can safely convert back and forth to numpy without overhead, for instance to use the out= kwarg for memory efficiency reasons.

do we need to keep numpy.array_api here for people who have an older numpy version and keep using it and expect that support for something experimental continues to exist in scikit-learn?

I don't see any value in this. numpy.array_api was a temporary experiment and it's going away. There is no point in using it in the future: it just adds complexity for no benefit.

numpy.array_api's value was mostly for testing array api compliance, and this is now better served by array-api-strict and it works even with older numpy versions as demonstrated by our CI.

ogrisel · 2024-03-18T14:53:36Z

Let me resolve the conflicts quickly.

sklearn/model_selection/tests/test_split.py

sklearn/utils/_array_api.py

betatim

LGTM.

If we feel like it, updating the doc string as I suggested would be nice. But otherwise: let's merge this thing.

Co-authored-by: Tim Head <betatim@gmail.com>

ogrisel · 2024-03-18T16:23:30Z

I pushed your docstring suggestion and enabled auto-merge on the PR. Thanks for the PR @rgommers and others for the reviews.

apply linters

9e482ea

ogrisel added the No Changelog Needed label Mar 3, 2024

ogrisel reviewed Mar 3, 2024

View reviewed changes

betatim reviewed Mar 7, 2024

View reviewed changes

sklearn/model_selection/_split.py Outdated Show resolved Hide resolved

betatim reviewed Mar 7, 2024

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

Merge branch 'main' into array-api-strict

0e4bc9d

ogrisel reviewed Mar 12, 2024

View reviewed changes

sklearn/linear_model/_base.py Outdated Show resolved Hide resolved

Apply suggestions from code review

ef13536

Co-authored-by: Tim Head <betatim@gmail.com>

ogrisel reviewed Mar 12, 2024

View reviewed changes

sklearn/linear_model/_base.py Outdated Show resolved Hide resolved

MAINT add the array-api-strict soft dependency to one of the CI configs

72a0d12

ogrisel added 5 commits March 14, 2024 11:14

Merge branch 'main' into array-api-strict

06fdd00

Merge main + update _array_indexing

f268cf7

Introduce and use indexing_dtype(xp)

46a3936

DOC better reflect Array API compliance ojectives in _NumPyAPIWrapper

1aa49e7

More docstring updates and do not wrap array_api_strict with _ArrayAP…

71ee9bf

…IWrapper

ogrisel approved these changes Mar 14, 2024

View reviewed changes

betatim reviewed Mar 14, 2024

View reviewed changes

Merge branch 'main' into array-api-strict

e22015c

ogrisel reviewed Mar 18, 2024

View reviewed changes

sklearn/model_selection/tests/test_split.py Outdated Show resolved Hide resolved

ogrisel added 3 commits March 18, 2024 16:08

Remove blank line introduced when resolving conflict.

9308c3b

Merge main and update lockfiles

5201bfa

Revert update made to pylatest_conda_forge_mkl_osx-64_conda.lock

d82ae16

betatim reviewed Mar 18, 2024

View reviewed changes

sklearn/utils/_array_api.py Outdated Show resolved Hide resolved

betatim approved these changes Mar 18, 2024

View reviewed changes

Simplify docstring.

b4befe9

Co-authored-by: Tim Head <betatim@gmail.com>

ogrisel enabled auto-merge (squash) March 18, 2024 16:22

ogrisel merged commit 5efc667 into scikit-learn:main Mar 18, 2024

ogrisel mentioned this pull request Mar 18, 2024

MNT bump to Cython=3.0.9 #28640

Merged

Uh oh!

MAINT: convert numpy.array_api to array-api-strict #28555

MAINT: convert numpy.array_api to array-api-strict #28555

Uh oh!

Conversation

rgommers commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

github-actions bot commented Feb 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✔️ Linting Passed

Uh oh!

rgommers commented Mar 1, 2024

Uh oh!

adrinjalali commented Mar 1, 2024

Uh oh!

adrinjalali commented Mar 1, 2024

Uh oh!

rgommers commented Mar 1, 2024

Uh oh!

adrinjalali commented Mar 1, 2024

Uh oh!

adrinjalali commented Mar 1, 2024

Uh oh!

rgommers commented Mar 1, 2024

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 3, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Mar 12, 2024

Uh oh!

Uh oh!

Uh oh!

ogrisel commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

betatim Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel Mar 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 18, 2024

Uh oh!

Uh oh!

Uh oh!

betatim left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Mar 18, 2024

Uh oh!

Uh oh!

MAINT: convert `numpy.array_api` to array-api-strict #28555

MAINT: convert `numpy.array_api` to array-api-strict #28555

rgommers commented Feb 29, 2024 •

edited

Loading

github-actions bot commented Feb 29, 2024 •

edited

Loading

ogrisel commented Mar 13, 2024 •

edited

Loading

betatim Mar 14, 2024 •

edited

Loading

ogrisel Mar 14, 2024 •

edited

Loading