-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: avoid nested dispatch in numpy.core.shape_base #13634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This is a partial reprise of the optimizations from numpyGH-13585. The trade-offs here are about readability, performance and whether these functions automatically work on ndarray subclasses. You'll have to judge the readability costs for yourself, but I think this is pretty reasonable. Here are the performance numbers for three relevant functions with the following IPython script: import numpy as np x = np.array([1]) xs = [x, x, x] for func in [np.stack, np.vstack, np.block]: %timeit func(xs) | Function | Master | This PR | | --- | --- | --- | | `stack` | 6.36 µs ± 175 ns | 6 µs ± 174 ns | | `vstack` | 7.18 µs ± 186 ns | 5.43 µs ± 125 ns | | `block` | 15.1 µs ± 141 ns | 11.3 µs ± 104 ns | The performance benefit for `stack` is somewhat marginal (perhaps it should be dropped), but it's much more meaningful inside `vstack`/`hstack` and `block`, because these functions call other dispatched functions within a loop. For automatically working on ndarray subclasses, the main concern would be that by skipping dispatch with `concatenate`, subclasses that define `concatenate` won't automatically get implementations for `*stack` functions. (But I don't think we should consider ourselves obligated to guarantee these implementation details, as I write in numpyGH-13633.) `block` also will not get an automatic implementation, but given that `block` uses two different code paths depending on argument values, this is probably a good thing, because there's no way the code path not involving `concatenate` could automatically work (it uses `empty()`).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure about this special-casing some functions: for stack
, the benefit is tiny, and for hstack
and vstack
it will likely be similarly small when the inefficient repeated call noted in-line has been changed.
For block
, the case is stronger, but given that we have #13186 with a much faster C implementation pending, it is not clear it is worth it (though among the overrides, the one I mind least are the ndim
and size
overrides - those are some of the functions where it is not obvious to me that those should even be captured under __array_function__
- getattr(a, 'ndim', 0)
would seem totally OK).
numpy/core/shape_base.py
Outdated
@@ -276,7 +290,7 @@ def vstack(tup): | |||
if not overrides.ARRAY_FUNCTION_ENABLED: | |||
# raise warning if necessary | |||
_arrays_for_stack_dispatcher(tup, stacklevel=2) | |||
return _nx.concatenate([atleast_2d(_m) for _m in tup], 0) | |||
return _concatenate([_atleast_2d(_m) for _m in tup], 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is strange: why loop over the arrays when atleast_2d
does that as well? This means one can reduce the overhead in your example by a factor 3 with zero effort or impact!
This of course separate from the very simple change in atleast_2d
to use np.array(ary, copy=False, subok=True, ndmin=2)
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
p.s. Though in my more ideal world, atleast_2d
would just call np.array(...)
only when .shape
and .reshape
were not present and otherwise just rely on those. But that's a separate issue!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I recall correctly, the atleast_*d
functions have different semantics than np.array(..., ndim=*)
numpy/core/shape_base.py
Outdated
@@ -331,12 +345,12 @@ def hstack(tup): | |||
# raise warning if necessary | |||
_arrays_for_stack_dispatcher(tup, stacklevel=2) | |||
|
|||
arrs = [atleast_1d(_m) for _m in tup] | |||
arrs = [_atleast_1d(_m) for _m in tup] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above
I'm still -0.5 on this whole thing. 20% performance gains for very small arrays don't seem like a strong argument for me for making things harder for subclasses to hook into. The readability cost is ok here, but it still leaves the reader asking the question "why are we skipping this dispatching?"
I don't see any reason not to guarantee that There are probably cases like the
I suppose this all predicates on having a |
fca9898
to
c0aae6b
Compare
OK, the latest version only removes the internal uses inside |
numpy/core/shape_base.py
Outdated
@@ -416,6 +416,13 @@ def stack(arrays, axis=0, out=None): | |||
return _nx.concatenate(expanded_arrays, axis=axis, out=out) | |||
|
|||
|
|||
# Internal functions to elimainate the overhead of repeated dispatch inside |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
spelling of eliminate. Also maybe write "repeated dispatch in one of the two possible paths inside np.block"
@shoyer - for p.s. I thought I should not let my suggested optimization of |
This is a partial reprise of the optimizations from GH-13585.
The trade-offs here are about readability, performance and whether these
functions automatically work on ndarray subclasses.
You'll have to judge the readability costs for yourself, but I think this is
pretty reasonable.
Here are the performance numbers for three relevant functions with the
following IPython script:
stack
vstack
block
The performance benefit for
stack
is somewhat marginal (perhaps it should bedropped), but it's much more meaningful inside
vstack
/hstack
andblock
,because these functions call other dispatched functions within a loop.
For automatically working on ndarray subclasses, the main concern would be that
by skipping dispatch with
concatenate
, subclasses that defineconcatenate
won't automatically get implementations for
*stack
functions. (But I don'tthink we should consider ourselves obligated to guarantee these implementation
details, as I write in GH-13633.)
block
also will not get an automatic implementation, but given thatblock
uses two different code paths depending on argument values, this is probably a
good thing, because there's no way the code path not involving
concatenate
could automatically work (it uses
empty()
).