MAINT: avoid nested dispatch in numpy.core.shape_base #13634

shoyer · 2019-05-26T23:59:54Z

This is a partial reprise of the optimizations from GH-13585.

The trade-offs here are about readability, performance and whether these
functions automatically work on ndarray subclasses.

You'll have to judge the readability costs for yourself, but I think this is
pretty reasonable.

Here are the performance numbers for three relevant functions with the
following IPython script:

import numpy as np
x = np.array([1])
xs = [x, x, x]
for func in [np.stack, np.vstack, np.block]:
    %timeit func(xs)

Function	Master	This PR
`stack`	6.36 µs ± 175 ns	6 µs ± 174 ns
`vstack`	7.18 µs ± 186 ns	5.43 µs ± 125 ns
`block`	15.1 µs ± 141 ns	11.3 µs ± 104 ns

The performance benefit for stack is somewhat marginal (perhaps it should be
dropped), but it's much more meaningful inside vstack/hstack and block,
because these functions call other dispatched functions within a loop.

For automatically working on ndarray subclasses, the main concern would be that
by skipping dispatch with concatenate, subclasses that define concatenate
won't automatically get implementations for *stack functions. (But I don't
think we should consider ourselves obligated to guarantee these implementation
details, as I write in GH-13633.)

block also will not get an automatic implementation, but given that block
uses two different code paths depending on argument values, this is probably a
good thing, because there's no way the code path not involving concatenate
could automatically work (it uses empty()).

This is a partial reprise of the optimizations from numpyGH-13585. The trade-offs here are about readability, performance and whether these functions automatically work on ndarray subclasses. You'll have to judge the readability costs for yourself, but I think this is pretty reasonable. Here are the performance numbers for three relevant functions with the following IPython script: import numpy as np x = np.array([1]) xs = [x, x, x] for func in [np.stack, np.vstack, np.block]: %timeit func(xs) | Function | Master | This PR | | --- | --- | --- | | `stack` | 6.36 µs ± 175 ns | 6 µs ± 174 ns | | `vstack` | 7.18 µs ± 186 ns | 5.43 µs ± 125 ns | | `block` | 15.1 µs ± 141 ns | 11.3 µs ± 104 ns | The performance benefit for `stack` is somewhat marginal (perhaps it should be dropped), but it's much more meaningful inside `vstack`/`hstack` and `block`, because these functions call other dispatched functions within a loop. For automatically working on ndarray subclasses, the main concern would be that by skipping dispatch with `concatenate`, subclasses that define `concatenate` won't automatically get implementations for `*stack` functions. (But I don't think we should consider ourselves obligated to guarantee these implementation details, as I write in numpyGH-13633.) `block` also will not get an automatic implementation, but given that `block` uses two different code paths depending on argument values, this is probably a good thing, because there's no way the code path not involving `concatenate` could automatically work (it uses `empty()`).

mhvk

Not sure about this special-casing some functions: for stack, the benefit is tiny, and for hstack and vstack it will likely be similarly small when the inefficient repeated call noted in-line has been changed.

For block, the case is stronger, but given that we have #13186 with a much faster C implementation pending, it is not clear it is worth it (though among the overrides, the one I mind least are the ndim and size overrides - those are some of the functions where it is not obvious to me that those should even be captured under __array_function__ - getattr(a, 'ndim', 0) would seem totally OK).

mhvk · 2019-05-28T20:11:05Z

numpy/core/shape_base.py

@@ -276,7 +290,7 @@ def vstack(tup):
    if not overrides.ARRAY_FUNCTION_ENABLED:
        # raise warning if necessary
        _arrays_for_stack_dispatcher(tup, stacklevel=2)
-    return _nx.concatenate([atleast_2d(_m) for _m in tup], 0)
+    return _concatenate([_atleast_2d(_m) for _m in tup], 0)


This code is strange: why loop over the arrays when atleast_2d does that as well? This means one can reduce the overhead in your example by a factor 3 with zero effort or impact!

This of course separate from the very simple change in atleast_2d to use np.array(ary, copy=False, subok=True, ndmin=2)...

p.s. Though in my more ideal world, atleast_2d would just call np.array(...) only when .shape and .reshape were not present and otherwise just rely on those. But that's a separate issue!

If I recall correctly, the atleast_*d functions have different semantics than np.array(..., ndim=*)

mhvk · 2019-05-28T20:11:12Z

numpy/core/shape_base.py

@@ -331,12 +345,12 @@ def hstack(tup):
        # raise warning if necessary
        _arrays_for_stack_dispatcher(tup, stacklevel=2)

-    arrs = [atleast_1d(_m) for _m in tup]
+    arrs = [_atleast_1d(_m) for _m in tup]


eric-wieser · 2019-06-01T22:16:16Z

I'm still -0.5 on this whole thing. 20% performance gains for very small arrays don't seem like a strong argument for me for making things harder for subclasses to hook into. The readability cost is ok here, but it still leaves the reader asking the question "why are we skipping this dispatching?"

But I don't think we should consider ourselves obligated to guarantee these implementation details

I don't see any reason not to guarantee that vstack and hstack will be implemented in terms of concatenate and atleast2d, since that's exactly its semantics.

There are probably cases like the mean / median thing where our internals could change. Imagine a foo(arr) method that currently delegates to bar(arr). We change it to internally call baz(arr) instead. One of two things happens to downstream subclasses:

Their code continues to work, as they already implement baz(arr) with sensible semantics
Their code does not work. They release an update to their library, and say "if you want to use numpy 1.17 and newer make sure to update to our new version". This would happen if:
- They do not implement baz(arr) at all. By making this change, we just caused a downstream library to improve and now support baz(arr) too.
- They implement baz(arr) with incompatible semantics. They should think carefully about whether this was a good idea. Either they fix baz(arr) (good), or they decide they need to overload foo(arr) specially after all.

I suppose this all predicates on having a __numpy_implementation__ attribute available to use inside __array_function__.

shoyer · 2019-06-01T22:35:17Z

OK, the latest version only removes the internal uses inside np.block. I think there is a pretty clear case in merging this to avoid a performance regression, given that #13186 may not make it into 1.17. Also, it's just a bad idea to partially support subclasses in a shape dependent fashion -- we should either support subclasses always with a function or never. Value dependent semantics are unlikely to be tested properly, which vastly increases the likelihood of bugs.

mhvk · 2019-06-02T00:02:24Z

numpy/core/shape_base.py

@@ -416,6 +416,13 @@ def stack(arrays, axis=0, out=None):
    return _nx.concatenate(expanded_arrays, axis=axis, out=out)


+# Internal functions to elimainate the overhead of repeated dispatch inside


spelling of eliminate. Also maybe write "repeated dispatch in one of the two possible paths inside np.block"

mhvk · 2019-06-02T00:26:34Z

@shoyer - for np.block, I agree the case is better given that the path taken depends on the shape. Also, forcing a specific override means people will be ready for the C implementation.

p.s. I thought I should not let my suggested optimization of hstack and vstack slip by: see #13697. About as good as what you had by bypassing dispatch.

mhvk reviewed May 28, 2019

View reviewed changes

charris added 03 - Maintenance component: numpy._core labels May 30, 2019

MAINT: only remove internal use in np.block

c0aae6b

shoyer force-pushed the shape_base_optimize branch from fca9898 to c0aae6b Compare June 1, 2019 22:30

mhvk reviewed Jun 2, 2019

View reviewed changes

mhvk mentioned this pull request Jun 2, 2019

MAINT: speed up hstack and vstack by eliminating list comprehension. #13697

Merged

MAINT: fixup comment on np.block optimization

48f8640

charris mentioned this pull request Jun 12, 2019

MAINT: fix histogram*d dispatchers #13757

Merged

shoyer merged commit 8f9ec01 into numpy:master Jun 12, 2019

mhvk mentioned this pull request Jun 20, 2019

Using __array_function__ beyond Quantity? astropy/astropy#8610

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: avoid nested dispatch in numpy.core.shape_base #13634

MAINT: avoid nested dispatch in numpy.core.shape_base #13634

shoyer commented May 26, 2019

mhvk left a comment

mhvk May 28, 2019

mhvk May 28, 2019

shoyer May 28, 2019

mhvk May 28, 2019

eric-wieser commented Jun 1, 2019 •

edited

Loading

shoyer commented Jun 1, 2019

mhvk Jun 2, 2019

mhvk commented Jun 2, 2019

		@@ -416,6 +416,13 @@ def stack(arrays, axis=0, out=None):
		return _nx.concatenate(expanded_arrays, axis=axis, out=out)


		# Internal functions to elimainate the overhead of repeated dispatch inside

MAINT: avoid nested dispatch in numpy.core.shape_base #13634

MAINT: avoid nested dispatch in numpy.core.shape_base #13634

Conversation

shoyer commented May 26, 2019

mhvk left a comment

Choose a reason for hiding this comment

mhvk May 28, 2019

Choose a reason for hiding this comment

mhvk May 28, 2019

Choose a reason for hiding this comment

shoyer May 28, 2019

Choose a reason for hiding this comment

mhvk May 28, 2019

Choose a reason for hiding this comment

eric-wieser commented Jun 1, 2019 • edited Loading

shoyer commented Jun 1, 2019

mhvk Jun 2, 2019

Choose a reason for hiding this comment

mhvk commented Jun 2, 2019

eric-wieser commented Jun 1, 2019 •

edited

Loading