Skip to content

ENH: Make numpy.array_api more portable #25370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Jan 21, 2024

Conversation

asmeurer
Copy link
Member

These will make it harder to write non-portable code that successfully works with numpy.array_api.

This way, it does not appear that "cpu" is a portable device object across
different array API compatible libraries. See
data-apis/array-api#626.
This way there is no ambiguity about the fact the non-portability of NumPy
dtype behavior, or the fact that NumPy dtypes are not necessarily allowed as
dtypes for non-NumPy array APIs.

Fixes numpy#23883
This is required by the standard, even though np.nonzero does not yet disallow
it.
def __repr__(self):
return f"np.array_api.{self._np_dtype.name}"

def __eq__(self, other):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking it might be a good idea to make == explicitly raise an exception when compared against a numpy dtype object. Right now numpy.float32 == numpy.array_api.float32 returns False, but I feel like this could mask bugs. Is there a straightforward way to detect if other is a NumPy dtype object?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a very fly-by comment, but in python generally == is supposed never to fail.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'm not completely sure about it. I did set this to error when I was creating this PR and it definitely helped me to find all the places in the code that needed to be updated.

My worry if is someone is incorrectly doing:

if x.dtype == np.float32: # x is a numpy.array_api array
    ...

or

if x.dtype == 'f4':
    ...

The code will just subtly start being wrong.

But then again, I don't know if that kind of error actually exists in the wild. It is already wrong for other libraries like PyTorch.

OTOH, like you say, == failing can be very annoying as it would make things like in not work properly. I think I'm leaning towards not making this change, but I'm interested to hear what others think.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notwithstanding that this should move eventually. For this namespace it seems right to raise errors liberally, since I think by now it is accepted as a testing implementation: If it works with this, it should work with anything.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about a warning? For testing purposes, those can be turned into errors.

Copy link
Member

@seberg seberg Dec 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an opinion, TBH. Besides testing, I never saw much purpose to the namespace as is and in that context I think liberally raising is OK too.

OTOH, if you formalize the "this is for testing" part of the namespace, then it does seem quite logical to have a custom UndefinedBehavior warning. That would make it easier to ignore it in tests if there is a reason to do so (not sure there is, but maybe it doesn't matter).

(or maybe UnspecifiedBehavior since UB in C is usually a bit scarier.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning sounds like a nice middle ground to me. But an exception is also defensible, and better than silently passing for unsupported objects to compare against.

@charris charris changed the title Make numpy.array_api more portable ENH: Make numpy.array_api more portable Dec 12, 2023
@charris
Copy link
Member

charris commented Dec 12, 2023

This could use a release note.

Copy link
Member

@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @asmeurer. This looks quite close, I found one issue when testing this on SciPy. Also the == warning/exception for dtype comparison needs to be finalized.

@@ -0,0 +1,14 @@
Make ``numpy.array_api`` more portable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's keep this release note; it can be cleaned up if and when we split off the whole numpy.array_api module into a standalone package.

def __repr__(self):
return f"np.array_api.{self._np_dtype.name}"

def __eq__(self, other):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A warning sounds like a nice middle ground to me. But an exception is also defensible, and better than silently passing for unsupported objects to compare against.


def __hash__(self):
# Note: this is not strictly required
# (https://github.com/data-apis/array-api/issues/582), but makes the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does seem like a good idea to add. And we should revisit the topic in the standard at some point.

@rgommers
Copy link
Member

Resolving the "2 blank lines" part of the linter complaints would also be nice to take along.

rgommers added a commit to rgommers/scipy that referenced this pull request Jan 16, 2024
These surfaced with the more strict reference implementation in
numpy/numpy#25370 and with the addition
of a `__array_namespace__` method to `numpy.ndarray` for NumPy 2.0
(which caused the `is_numpy` utility function to be wrong).
Even if the `is_numpy` issue is also handled by a change in
array-api-compat, this still seems like a logical change to make;
if `x.__name__` is `'numpy'` then `is_numpy(x)` should return True.

[skip cirrus] [skip circle]
…py dtypes

This is to prevent user error, since something like numpy.array_api.float32 ==
numpy.float32 gives False.
@rgommers
Copy link
Member

The test_warning_calls failures are real.

And now that gh-25317 is merged, it'd be good to update this PR for device handling of fftfreq/rfftfreq.

@rgommers
Copy link
Member

This LGTM now, but the added strictness is hard to deal with without the fix suggested in data-apis/array-api-compat#77 (comment). So ideally that would be fixed first before merging this.

rgommers added a commit to rgommers/scipy that referenced this pull request Jan 17, 2024
These surfaced with the more strict reference implementation in
numpy/numpy#25370 and with the addition
of a `__array_namespace__` method to `numpy.ndarray` for NumPy 2.0
(which caused the `is_numpy` utility function to be wrong).
Even if the `is_numpy` issue is also handled by a change in
array-api-compat, this still seems like a logical change to make;
if `x.__name__` is `'numpy'` then `is_numpy(x)` should return True.

Regarding the tightening up of bool dtype handling in `_lib.lazywhere`:
plain `bool` is technically not guaranteed to work, and the stricter
checks in the reference implementation will start rejecting it.

[skip cirrus] [skip circle]
@asmeurer
Copy link
Member Author

@rgommers I'm not following how that array-api-compat issue relates to this.

@rgommers
Copy link
Member

@rgommers I'm not following how that array-api-compat issue relates to this.

I ran into that issue when testing this PR with SciPy. All my attempts at fixing up SciPy to be robust against changes in this PR are blocked until that issue is fixed. Basically, the addition of numpy.ndarray.__array_namespace__ changed the behavior of array_api_compat - but that was still okay when this module had some flexibility. The new dtype objects added in this PR are very restrictive, so it's impossible to get away with mixing numpy and numpy.array_api, or using dtype=bool (the Python builtin bool) after this gets merged.

tl;dr I don't really want to leave SciPy (and scikit-learn) in a state where tons of tests fail with NumPy main.

@asmeurer
Copy link
Member Author

I guess there must be some complexity in the scipy implementation for this that to be related to this. I'm surprised you have array-api compatible code that's mixing numpy dtypes, since that already doesn't work for pytorch or cupy.

@rgommers
Copy link
Member

Now that scipy/scipy#19885 is merged, let's give this a go too. Thanks @asmeurer & reviewers!

@rgommers rgommers merged commit ad36032 into numpy:main Jan 21, 2024
@rgommers
Copy link
Member

@asmeurer it looks like we're ready now for you to create the standalone version of numpy.array_api, so that we can make an announcement and switch over the known consumers and remove numpy.array_api before the 2.0.x branch point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Make the dtype objects in numpy.array_api more strict
6 participants