-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
DOC: caution against relying upon NumPy's implementation in subclasses #13633
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think this is an important warning to include for subclass authors. Otherwise, we will be expanding our exposure of internal APIs as part of ``__array_function__``. All things being equal, it's great when things "just work" subclasses, but I don't want to guarantee it. In particular, I would be very displeased if ``__array_function__`` leads to NumPy adding more subclass specific hacks like always calling ``mean()`` inside ``median()`` (numpyGH-3846). mhvk: please take a look.
This is a partial reprise of the optimizations from numpyGH-13585. The trade-offs here are about readability, performance and whether these functions automatically work on ndarray subclasses. You'll have to judge the readability costs for yourself, but I think this is pretty reasonable. Here are the performance numbers for three relevant functions with the following IPython script: import numpy as np x = np.array([1]) xs = [x, x, x] for func in [np.stack, np.vstack, np.block]: %timeit func(xs) | Function | Master | This PR | | --- | --- | --- | | `stack` | 6.36 µs ± 175 ns | 6 µs ± 174 ns | | `vstack` | 7.18 µs ± 186 ns | 5.43 µs ± 125 ns | | `block` | 15.1 µs ± 141 ns | 11.3 µs ± 104 ns | The performance benefit for `stack` is somewhat marginal (perhaps it should be dropped), but it's much more meaningful inside `vstack`/`hstack` and `block`, because these functions call other dispatched functions within a loop. For automatically working on ndarray subclasses, the main concern would be that by skipping dispatch with `concatenate`, subclasses that define `concatenate` won't automatically get implementations for `*stack` functions. (But I don't think we should consider ourselves obligated to guarantee these implementation details, as I write in numpyGH-13633.) `block` also will not get an automatic implementation, but given that `block` uses two different code paths depending on argument values, this is probably a good thing, because there's no way the code path not involving `concatenate` could automatically work (it uses `empty()`).
If we actually follow Python protocol rules: >>> class A:
... def __add__(self, other):
... return (self, other)
...
>>> class B(A):
... def __add__(self, other):
... return NotImplemented
...
>>> B() + B()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'B' and 'B' We should only allow this when there IS an if func not in IMPLEMENTATIONS:
return super().__array_function__(func, args, kwargs, arrays)
return IMPLEMENTATIONS[func](*args, **kwargs) |
@hameerabbasi I'm not sure I understand your example. I think class SubArray(np.ndarray):
def __array_function__(self, func, types, args, kwargs):
return NotImplemented
array = np.array([1]).view(SubArray)
np.stack([array, array]) # TypeError: no implementation found for 'numpy.stack' You only get the base-class behavior if at least one argument inherits from |
In this case, I think the warning should be extended to everything that uses |
How else would you use OK, I suppose you could indeed call it directly, but calling special methods directly is generally not recommended in Python. The methods exist for the sake of a protocol, not arbitrary usage. In any case, if you call |
p.s. Hopefully, the bugs smoked out by |
Thanks @shoyer. |
* MAINT: avoid nested dispatch in numpy.core.shape_base This is a partial reprise of the optimizations from GH-13585. The trade-offs here are about readability, performance and whether these functions automatically work on ndarray subclasses. You'll have to judge the readability costs for yourself, but I think this is pretty reasonable. Here are the performance numbers for three relevant functions with the following IPython script: import numpy as np x = np.array([1]) xs = [x, x, x] for func in [np.stack, np.vstack, np.block]: %timeit func(xs) | Function | Master | This PR | | --- | --- | --- | | `stack` | 6.36 µs ± 175 ns | 6 µs ± 174 ns | | `vstack` | 7.18 µs ± 186 ns | 5.43 µs ± 125 ns | | `block` | 15.1 µs ± 141 ns | 11.3 µs ± 104 ns | The performance benefit for `stack` is somewhat marginal (perhaps it should be dropped), but it's much more meaningful inside `vstack`/`hstack` and `block`, because these functions call other dispatched functions within a loop. For automatically working on ndarray subclasses, the main concern would be that by skipping dispatch with `concatenate`, subclasses that define `concatenate` won't automatically get implementations for `*stack` functions. (But I don't think we should consider ourselves obligated to guarantee these implementation details, as I write in GH-13633.) `block` also will not get an automatic implementation, but given that `block` uses two different code paths depending on argument values, this is probably a good thing, because there's no way the code path not involving `concatenate` could automatically work (it uses `empty()`). * MAINT: only remove internal use in np.block * MAINT: fixup comment on np.block optimization
xref #12028
I think this is an important warning to include for subclass authors.
Otherwise, we will be expanding our exposure of internal APIs as part of
__array_function__
. All things being equal, it's great when things "justwork" subclasses, but I don't want to guarantee it. In particular, I would be
very displeased if
__array_function__
leads to NumPy adding more subclassspecific hacks like always calling
mean()
insidemedian()
(GH-3846).@mhvk: please take a look.