MAINT: Simplify block implementation #9667

j-towns · 2017-09-07T22:09:12Z

I've simplified the implementation of numpy.block, without changing its API. I thought the implementation was pretty bloated, particularly with the _Recurser class, which IMO was obscuring the logic of the function. This is a subjective judgement of course.

A happy side effect of this change is that performance of block has improved. I've implemented some benchmarks (based on the test cases), and running

python runtests.py --bench-compare master bench_shape_base

on my laptop I get

    before     after       ratio
  master     this branch
  [6810e1e9] [dbfdf227]
-  136.63μs   101.20μs      0.74  bench_shape_base.time_block_complicated
-  153.14μs   107.66μs      0.70  bench_shape_base.time_3d
-   71.38μs    43.62μs      0.61  bench_shape_base.time_block_with_1d_arrays_multiple_rows
-  172.43μs   105.02μs      0.61  bench_shape_base.time_nested
-   57.96μs    32.39μs      0.56  bench_shape_base.time_block_mixed_1d_and_2d
-   61.10μs    32.99μs      0.54  bench_shape_base.time_block_simple_column_wise
-   32.18μs    16.99μs      0.53  bench_shape_base.time_block_simple_row_wise
-   64.96μs    33.23μs      0.51  bench_shape_base.time_block_with_1d_arrays_column_wise
-   31.49μs    16.01μs      0.51  bench_shape_base.time_block_with_1d_arrays_row_wise
-   27.26μs     8.22μs      0.30  bench_shape_base.time_no_lists

This is my first pr submitted to numpy, apologies in advance if there's anything I need to do that I've missed!

Edit: bf616bf further improved performance by not copying input arrays. Now:

-  133.93μs    71.64μs      0.53  bench_shape_base.time_block_complicated
-  165.07μs    90.00μs      0.55  bench_shape_base.time_3d
-   72.27μs    36.27μs      0.50  bench_shape_base.time_block_with_1d_arrays_multiple_rows
-  178.65μs    91.65μs      0.51  bench_shape_base.time_nested
-   58.82μs    29.15μs      0.50  bench_shape_base.time_block_mixed_1d_and_2d
-   58.27μs    28.19μs      0.48  bench_shape_base.time_block_simple_column_wise
-   33.58μs    14.26μs      0.42  bench_shape_base.time_block_simple_row_wise
-   59.37μs    29.86μs      0.50  bench_shape_base.time_block_with_1d_arrays_column_wise
-   32.34μs    13.81μs      0.43  bench_shape_base.time_block_with_1d_arrays_row_wise
-   27.94μs     7.95μs      0.28  bench_shape_base.time_no_lists

Edit 2: Updated benchmarks running on bd6729d vs. master:

    before     after       ratio
  [e64699dc] [bd6729d0]
+     4.43s      4.83s      1.09  bench_shape_base.Block.time_3d(100)
-  596.23μs   529.67μs      0.89  bench_shape_base.Block.time_block_complicated(100)
-  151.80μs   116.40μs      0.77  bench_shape_base.Block.time_block_simple_column_wise(100)
-   89.47μs    67.53μs      0.75  bench_shape_base.Block.time_block_simple_row_wise(100)
-   60.31μs    44.77μs      0.74  bench_shape_base.Block.time_no_lists(100)
-  330.40μs   226.72μs      0.69  bench_shape_base.Block.time_nested(100)
-   26.58μs    13.42μs      0.50  bench_shape_base.Block.time_no_lists(10)
-   24.79μs    12.36μs      0.50  bench_shape_base.Block.time_no_lists(1)
-  135.44μs    66.13μs      0.49  bench_shape_base.Block.time_block_complicated(10)
-  935.52μs   400.46μs      0.43  bench_shape_base.Block.time_3d(10)
-   56.96μs    24.34μs      0.43  bench_shape_base.Block.time_block_simple_column_wise(10)
-  126.27μs    52.76μs      0.42  bench_shape_base.Block.time_block_complicated(1)
-   55.89μs    23.25μs      0.42  bench_shape_base.Block.time_block_simple_column_wise(1)
-  172.84μs    70.91μs      0.41  bench_shape_base.Block.time_nested(10)
-   31.93μs    12.83μs      0.40  bench_shape_base.Block.time_block_simple_row_wise(10)
-  153.82μs    61.13μs      0.40  bench_shape_base.Block.time_3d(1)
-   30.98μs    12.06μs      0.39  bench_shape_base.Block.time_block_simple_row_wise(1)
-  182.28μs    68.20μs      0.37  bench_shape_base.Block.time_nested(1)

charris · 2017-09-07T22:32:10Z

Note changed title, see doc/source/dev/gitwash/development_workflow.rst for the prefixes.

@eric-wieser Ping.

j-towns · 2017-09-07T23:00:23Z

Looks like travis-ci are having some issues currently: https://www.traviscistatus.com/incidents/r0f4m54qp0tr

eric-wieser · 2017-09-10T09:17:51Z

I'm not convinced that this is conceptually simpler, but it is definitely more performant.

However, if we want performance, it might be more sensible to avoid the repeated concatenations, and build the result all at once.

I don't know enough about ASV to know if those benchmarks will run correctly, but they don't look to be in the same format as any of our other benchmarks. Can you fix up the benchmarks to use the class-based style, and then maybe submit a separate PR of just the benchmark?

eric-wieser · 2017-09-10T09:21:05Z

numpy/core/shape_base.py

+                )
+        return first_index
+    elif isinstance(arrays, list) and len(arrays) == 0:
+        return index + [None]


Can you explain what's going on here?

If an empty list is encountered the recursion needs to back up and an error message may need to be generated, if the depths don't match.

I used the length of the index list for the depth info in the error message. On this line I'm making sure that the length of index indeed reflects the depth of nested lists, but using None to flag that this actual index shouldn't be included in the error message. This is parsed at the end of this line. I could have used some other value (such as -1) to flag an empty list.

As an example (from the tests), if you do

np.block([1, []])

the error that you get should be

ValueError: List depths are mismatched. First element was at depth 1, but there is an element at depth 2 (arrays[1])

At the end of the message only one index is printed even though the depth is two — that's the kind of situation I'm preparing for in the above line.

j-towns · 2017-09-10T23:34:50Z

Hey thanks a lot for reviewing.

Can you fix up the benchmarks to use the class-based style, and then maybe submit a separate PR of just the benchmark?

I can certainly do both of those things.

However, if we want performance, it might be more sensible to avoid the repeated concatenations, and build the result all at once.

How exactly would we do that? Assuming we could do that without too much coding effort, I think the performance benefit would be significant only for large-ish arrays. For small arrays I think it we wouldn't get much speedup (I just profiled time_nested to get some idea and in that benchmark only 27% of the time is spent in np.concatenate) edit: just realised that was running on the old implementation of block, on my implementation it seems to be around 20%, I have no idea why it's less, since the same number of calls were made, presumably on the same arrays but anyway we're looking at roughly 20-30%.

eric-wieser · 2017-09-10T09:22:02Z

numpy/core/shape_base.py

+
+
+def _block(arrays, depth=0):
+    if isinstance(arrays, list):


This is not the same as type(arrays) is list

Good point, will fix that.

eric-wieser · 2017-09-10T09:24:39Z

numpy/core/shape_base.py

+        list_ndim = list_ndims[0]
+        arr_ndim = max(arr.ndim for arr in arrs)
+        ndim = max(list_ndim, arr_ndim)
+        arrs = [array(a, ndmin=ndim, copy=False, subok=True) for a in arrs]


How big a performance improvement does just using this as the implementation of atleast_nd give?

Probably little or none, it's just less code. I agree it's nice and clear to have an atleast_nd function though, so maybe doing that with

def atleast_nd(a, ndim): # Ensures `a` has at least `ndim` dimensions by prepending # ones to `a.shape` as necessary return array(a, ndmin=ndim, copy=False, subok=True)

would be a good compromise?

eric-wieser · 2017-09-10T09:25:52Z

numpy/core/shape_base.py

-            #  - more than one way to do things - no point treating tuples like
-            #    lists
-            #  - horribly confusing behaviour that results when tuples are
-            #    treated like ndarray


Any reason for removing this comment?

No, I actually hadn't meant to delete that, will re-include it.

eric-wieser · 2017-09-11T06:07:27Z

How exactly would we do that? Assuming we could do that without too much coding effort, I think the performance benefit would be significant only for large-ish arrays.

By precalculating the output size, allocating it once, and finishing off #9209. But you're right, this would only be helpful for large arrays

j-towns · 2017-09-11T09:52:47Z

By precalculating the output size, allocating it once, and finishing off #9209.

OK nice, I'd be happy to add that to this pr once #9209 is merged (which looks like it might happen soonish?). Just to check I'm not confused, you'd still have to call concatenate the same number of times, right? And the better performance would come from only allocating a buffer once.

eric-wieser · 2017-09-11T22:24:53Z

If you put the benchmarks in their own PR, I'll merge them right away - that way, the tool running them will have a datapoint from before this PR.

charris · 2017-09-12T19:48:38Z

I fixed the merge conflict. You may want to pull this down from origin before making anymore commits.

Remove _Recurser class.

array kwargs set to copy=False, subok=True

j-towns · 2017-09-18T12:39:24Z

Have rebased onto master so that I can use the changes in #9209

j-towns · 2017-09-18T22:52:14Z

Hey @eric-wieser I spent a bit of time thinking today and I don't think setting up the approach you suggested is going to be super straightforward (i.e. probably at least a few hours of work for my slow brain).

Would you be up for merging this pr as is and then if I have time in the future I might have a look at further performance improvements along the lines of what you suggested? Otherwise I will close this for now.

eric-wieser · 2017-09-18T23:01:35Z

@j-towns, you don't need to add the out argument handling - I might attempt that myself, and leave this open to compare against.

I'll put a label on this for 1.14 though, so that if I can't produce a faster implementation, your patch makes it in.

eric-wieser · 2017-09-18T23:02:23Z

numpy/core/shape_base.py

-            # yield from ...
-            for v in self.walk(xi, index + (i,)):
-                yield v
+def _block_check_depths_match(arrays, index=[]):


This could do with a comment explaining what it returns, especially since its recursive.

To be clear, what I'm looking for is a docstring explaining what the index argument and return values are.

Perhaps this should be called parent_index?

Thanks again for helpful review comments. Do you reckon the docstring I've written is now sufficient?

eric-wieser · 2017-09-19T00:00:21Z

numpy/core/shape_base.py

+                   for i, arr in enumerate(arrays)]
+
+        first_index = indexes[0]
+        for i, index in enumerate(indexes):


Having this overwrite the index parameter is confusing

eric-wieser · 2017-11-01T16:34:49Z

Something very bad seems to have happened with git - did you rebase or merge there? I changed the base branch twice to clean it up a little, but it still claims you have 41 commits

eric-wieser · 2017-11-01T16:37:45Z

Ok, checking out locally, it looks like you rebased, and then merged with the version you had before the rebase (presumably because your push was rejected - you should have force-pushed, instead of pulling), which means there's now two copies of every commit. This commit is a big red flag, which tells us what happened:

Merge branch 'simplify-block-implementation' of https://github.com/j-towns/numpy into simplify-block-implementation

You might want to use git reflog to work out which simplify-block-implementation is which, and then roll back to that.

j-towns · 2017-11-03T11:16:19Z

OK @eric-wieser that should be fixed now, thanks for the help. Very sorry about that 😳.

charris · 2017-11-08T19:38:43Z

Haven't been following this in detail. What is the current status? @jakirkham Is the missing _Recurser a problem that we should worry about?

eric-wieser · 2017-11-08T19:48:14Z

Waiting for a followup on this comment about avoiding the generator comprehension, but overally looks pretty good.

j-towns · 2017-11-08T22:36:39Z

@eric-wieser in the last few commits I cut the generator expression out. This only had a minor effect on performance. Let me know if there's anything else I should do.

Edit: sorry, to clarify, I cut out the zip(*...) expression. I think I was confusing your comment with an earlier one. I can have a go at cutting the generator expression entirely though I'll be surprised if that has a drastic impact on performance.

eric-wieser · 2017-11-09T02:15:41Z

Up to you then. I'd expect removing the generator to make up for the performance loss in trivial cases from removing zip, but what you have right now is pretty readable

eric-wieser · 2017-11-09T08:10:01Z

numpy/core/shape_base.py

+                      for i, arr in enumerate(arrays))
+
+        first_index, max_arr_ndim = next(idxs_ndims)
+        for i, (index, ndim) in enumerate(idxs_ndims, 1):


This i and enumerate is not needed here

eric-wieser · 2017-11-09T08:26:38Z

Get rid of that enumerate that I don't think is being used, and I'm happy to put this in as is.

eric-wieser · 2017-11-09T08:29:14Z

@jakirkham: What would atleast_nd look like in dask? There have already been proposals for __array_concatenate__, which would fix concatenate eventually.

j-towns · 2017-11-09T14:42:51Z

@eric-wieser done.

charris · 2017-11-11T23:22:02Z

Everyone happy with this now?

jakirkham · 2017-11-12T00:27:26Z

@jakirkham: What would atleast_nd look like in dask? There have already been proposals for __array_concatenate__, which would fix concatenate eventually.

We normally do things like a[None] to add singleton dimensions, which should also work for NumPy arrays. Here are the implementations for some atleast_*d functions in dask. Would have to look at the spec of atleast_nd to know how to go about this for sure.

eric-wieser

All looks good to me.

@jakirkham,it looks like _block_check_depths_match should work just fine on dask arrays, in which case it doesn't seem too hard for you to implement a dask version of block. Obviously it's not public API, but nor was _Recurser.

charris · 2017-11-12T15:26:16Z

Going the put this in. @jakirkham If there are later modifications that would be useful for dask, please make a PR for that so that it can be discussed as a separate topic. If we are going to supply a function it should be public.

charris · 2017-11-12T15:26:56Z

Thanks @j-towns .

eric-wieser · 2017-11-13T07:02:18Z

Probably should have squashed that when we merged, since it's 23 commits without our commit prefices. Oh well.

This restores the changes in numpygh-9667 that were overwritten.

charris added 03 - Maintenance component: numpy._core labels Sep 7, 2017

j-towns changed the title ~~Simplify block implementation~~ MAINT: Simplify block implementation Sep 7, 2017

eric-wieser reviewed Sep 10, 2017

View reviewed changes

eric-wieser reviewed Sep 11, 2017

View reviewed changes

j-towns mentioned this pull request Sep 12, 2017

MAINT: Add block benchmarks #9681

Merged

j-towns added 7 commits September 18, 2017 13:26

Simplify block implementation

a5c6f0d

Remove _Recurser class.

np.block style improvements

2dcc9aa

Reflect asanyarray behaviour in block

e787a9f

array kwargs set to copy=False, subok=True

Add empty list comment to block depth check

95adb77

Use strict type checking (not isinstance)

07a3f43

Re-add tuple type-check comment

19fc68c

Re-add atleast_nd function.

997ac2c

j-towns force-pushed the simplify-block-implementation branch from 29ed7b8 to 997ac2c Compare September 18, 2017 12:34

eric-wieser added this to the 1.14.0 release milestone Sep 18, 2017

eric-wieser reviewed Sep 18, 2017

View reviewed changes

Add detailed comment to _block_check_depths_match

7eb1044

eric-wieser reviewed Sep 19, 2017

View reviewed changes

Extend comments _block_check_depths_match

ff7f726

eric-wieser changed the base branch from maintenance/1.13.x to master November 1, 2017 16:33

j-towns added 4 commits November 3, 2017 11:13

Fix python 3.4 sequence error

ad278f3

Correct empty list ndim

2c1734b

Avoid using zip(*...) syntax

eaddf39

Use builtin next method

a5cbc93

j-towns force-pushed the simplify-block-implementation branch from 2e94872 to a5cbc93 Compare November 3, 2017 11:15

eric-wieser reviewed Nov 9, 2017

View reviewed changes

Rm unnecessary enumerate

a691f2d

eric-wieser approved these changes Nov 12, 2017

View reviewed changes

charris merged commit 7a3efef into numpy:master Nov 12, 2017

This was referenced Nov 13, 2017

ENH: implement voidtype_repr and voidtype_str #8981

Merged

REV: Undo bad rebase in 7fdfdd6a52fc0761c0d45931247c5ed2480224eb (#8981) #10017

Merged

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 13, 2017

REV: Undo bad rebase in numpygh-8981 (7fdfdd6)

ae338e4

This restores the changes in numpygh-9667 that were overwritten.

eric-wieser mentioned this pull request Sep 18, 2018

MAINT: Block algorithm with a single copy per call to block #11971

Merged

5 tasks

hmaarrfk mentioned this pull request Sep 19, 2018

MAINT: speed up _block by avoiding a recursive closure #11991

Merged

MAINT: Simplify block implementation #9667

MAINT: Simplify block implementation #9667

Conversation

j-towns commented Sep 7, 2017 • edited Loading

charris commented Sep 7, 2017

j-towns commented Sep 7, 2017

eric-wieser commented Sep 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

j-towns Sep 10, 2017 • edited Loading

Choose a reason for hiding this comment

j-towns commented Sep 10, 2017 • edited Loading

Choose a reason for hiding this comment

j-towns Sep 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser commented Sep 11, 2017

j-towns commented Sep 11, 2017

eric-wieser commented Sep 11, 2017

charris commented Sep 12, 2017

j-towns commented Sep 18, 2017

j-towns commented Sep 18, 2017 • edited Loading

eric-wieser commented Sep 18, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eric-wieser commented Nov 1, 2017

eric-wieser commented Nov 1, 2017 • edited Loading

j-towns commented Nov 3, 2017 • edited Loading

charris commented Nov 8, 2017

eric-wieser commented Nov 8, 2017 • edited Loading

j-towns commented Nov 8, 2017 • edited Loading

eric-wieser commented Nov 9, 2017

eric-wieser Nov 9, 2017 • edited Loading

Choose a reason for hiding this comment

eric-wieser commented Nov 9, 2017

eric-wieser commented Nov 9, 2017 • edited Loading

j-towns commented Nov 9, 2017

charris commented Nov 11, 2017

jakirkham commented Nov 12, 2017

eric-wieser left a comment

Choose a reason for hiding this comment

charris commented Nov 12, 2017

charris commented Nov 12, 2017

eric-wieser commented Nov 13, 2017

j-towns commented Sep 7, 2017 •

edited

Loading

j-towns Sep 10, 2017 •

edited

Loading

j-towns commented Sep 10, 2017 •

edited

Loading

j-towns Sep 11, 2017 •

edited

Loading

j-towns commented Sep 18, 2017 •

edited

Loading

eric-wieser commented Nov 1, 2017 •

edited

Loading

j-towns commented Nov 3, 2017 •

edited

Loading

eric-wieser commented Nov 8, 2017 •

edited

Loading

j-towns commented Nov 8, 2017 •

edited

Loading

eric-wieser Nov 9, 2017 •

edited

Loading

eric-wieser commented Nov 9, 2017 •

edited

Loading