-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: Make numpy.array_api more portable #25370
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This way, it does not appear that "cpu" is a portable device object across different array API compatible libraries. See data-apis/array-api#626.
This way there is no ambiguity about the fact the non-portability of NumPy dtype behavior, or the fact that NumPy dtypes are not necessarily allowed as dtypes for non-NumPy array APIs. Fixes numpy#23883
This is required by the standard, even though np.nonzero does not yet disallow it.
def __repr__(self): | ||
return f"np.array_api.{self._np_dtype.name}" | ||
|
||
def __eq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking it might be a good idea to make ==
explicitly raise an exception when compared against a numpy
dtype object. Right now numpy.float32 == numpy.array_api.float32
returns False
, but I feel like this could mask bugs. Is there a straightforward way to detect if other
is a NumPy dtype object?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a very fly-by comment, but in python generally ==
is supposed never to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I'm not completely sure about it. I did set this to error when I was creating this PR and it definitely helped me to find all the places in the code that needed to be updated.
My worry if is someone is incorrectly doing:
if x.dtype == np.float32: # x is a numpy.array_api array
...
or
if x.dtype == 'f4':
...
The code will just subtly start being wrong.
But then again, I don't know if that kind of error actually exists in the wild. It is already wrong for other libraries like PyTorch.
OTOH, like you say, ==
failing can be very annoying as it would make things like in
not work properly. I think I'm leaning towards not making this change, but I'm interested to hear what others think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notwithstanding that this should move eventually. For this namespace it seems right to raise errors liberally, since I think by now it is accepted as a testing implementation: If it works with this, it should work with anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about a warning? For testing purposes, those can be turned into errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have an opinion, TBH. Besides testing, I never saw much purpose to the namespace as is and in that context I think liberally raising is OK too.
OTOH, if you formalize the "this is for testing" part of the namespace, then it does seem quite logical to have a custom UndefinedBehavior
warning. That would make it easier to ignore it in tests if there is a reason to do so (not sure there is, but maybe it doesn't matter).
(or maybe UnspecifiedBehavior
since UB in C is usually a bit scarier.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A warning sounds like a nice middle ground to me. But an exception is also defensible, and better than silently passing for unsupported objects to compare against.
This could use a release note. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @asmeurer. This looks quite close, I found one issue when testing this on SciPy. Also the ==
warning/exception for dtype comparison needs to be finalized.
@@ -0,0 +1,14 @@ | |||
Make ``numpy.array_api`` more portable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep this release note; it can be cleaned up if and when we split off the whole numpy.array_api
module into a standalone package.
def __repr__(self): | ||
return f"np.array_api.{self._np_dtype.name}" | ||
|
||
def __eq__(self, other): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A warning sounds like a nice middle ground to me. But an exception is also defensible, and better than silently passing for unsupported objects to compare against.
|
||
def __hash__(self): | ||
# Note: this is not strictly required | ||
# (https://github.com/data-apis/array-api/issues/582), but makes the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem like a good idea to add. And we should revisit the topic in the standard at some point.
Resolving the "2 blank lines" part of the linter complaints would also be nice to take along. |
These surfaced with the more strict reference implementation in numpy/numpy#25370 and with the addition of a `__array_namespace__` method to `numpy.ndarray` for NumPy 2.0 (which caused the `is_numpy` utility function to be wrong). Even if the `is_numpy` issue is also handled by a change in array-api-compat, this still seems like a logical change to make; if `x.__name__` is `'numpy'` then `is_numpy(x)` should return True. [skip cirrus] [skip circle]
…py dtypes This is to prevent user error, since something like numpy.array_api.float32 == numpy.float32 gives False.
The And now that gh-25317 is merged, it'd be good to update this PR for device handling of |
This LGTM now, but the added strictness is hard to deal with without the fix suggested in data-apis/array-api-compat#77 (comment). So ideally that would be fixed first before merging this. |
These surfaced with the more strict reference implementation in numpy/numpy#25370 and with the addition of a `__array_namespace__` method to `numpy.ndarray` for NumPy 2.0 (which caused the `is_numpy` utility function to be wrong). Even if the `is_numpy` issue is also handled by a change in array-api-compat, this still seems like a logical change to make; if `x.__name__` is `'numpy'` then `is_numpy(x)` should return True. Regarding the tightening up of bool dtype handling in `_lib.lazywhere`: plain `bool` is technically not guaranteed to work, and the stricter checks in the reference implementation will start rejecting it. [skip cirrus] [skip circle]
@rgommers I'm not following how that array-api-compat issue relates to this. |
I ran into that issue when testing this PR with SciPy. All my attempts at fixing up SciPy to be robust against changes in this PR are blocked until that issue is fixed. Basically, the addition of tl;dr I don't really want to leave SciPy (and scikit-learn) in a state where tons of tests fail with NumPy |
I guess there must be some complexity in the scipy implementation for this that to be related to this. I'm surprised you have array-api compatible code that's mixing numpy dtypes, since that already doesn't work for pytorch or cupy. |
Now that scipy/scipy#19885 is merged, let's give this a go too. Thanks @asmeurer & reviewers! |
@asmeurer it looks like we're ready now for you to create the standalone version of |
.device
return a specialCPU_DEVICE
object (not present in the namespace) instead of"cpu"
. See to_device() -- any way to force back to host "portably?" data-apis/array-api#626.nonzero
to disallow 0-D inputs (this is not related to portability, but it was a small change so I figured I would just include it here).These will make it harder to write non-portable code that successfully works with
numpy.array_api
.