Skip to content

MAINT: Simplify block implementation #9667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 12, 2017

Conversation

j-towns
Copy link
Contributor

@j-towns j-towns commented Sep 7, 2017

I've simplified the implementation of numpy.block, without changing its API. I thought the implementation was pretty bloated, particularly with the _Recurser class, which IMO was obscuring the logic of the function. This is a subjective judgement of course.

A happy side effect of this change is that performance of block has improved. I've implemented some benchmarks (based on the test cases), and running

python runtests.py --bench-compare master bench_shape_base

on my laptop I get

    before     after       ratio
  master     this branch
  [6810e1e9] [dbfdf227]
-  136.63μs   101.20μs      0.74  bench_shape_base.time_block_complicated
-  153.14μs   107.66μs      0.70  bench_shape_base.time_3d
-   71.38μs    43.62μs      0.61  bench_shape_base.time_block_with_1d_arrays_multiple_rows
-  172.43μs   105.02μs      0.61  bench_shape_base.time_nested
-   57.96μs    32.39μs      0.56  bench_shape_base.time_block_mixed_1d_and_2d
-   61.10μs    32.99μs      0.54  bench_shape_base.time_block_simple_column_wise
-   32.18μs    16.99μs      0.53  bench_shape_base.time_block_simple_row_wise
-   64.96μs    33.23μs      0.51  bench_shape_base.time_block_with_1d_arrays_column_wise
-   31.49μs    16.01μs      0.51  bench_shape_base.time_block_with_1d_arrays_row_wise
-   27.26μs     8.22μs      0.30  bench_shape_base.time_no_lists

This is my first pr submitted to numpy, apologies in advance if there's anything I need to do that I've missed!

Edit: bf616bf further improved performance by not copying input arrays. Now:

-  133.93μs    71.64μs      0.53  bench_shape_base.time_block_complicated
-  165.07μs    90.00μs      0.55  bench_shape_base.time_3d
-   72.27μs    36.27μs      0.50  bench_shape_base.time_block_with_1d_arrays_multiple_rows
-  178.65μs    91.65μs      0.51  bench_shape_base.time_nested
-   58.82μs    29.15μs      0.50  bench_shape_base.time_block_mixed_1d_and_2d
-   58.27μs    28.19μs      0.48  bench_shape_base.time_block_simple_column_wise
-   33.58μs    14.26μs      0.42  bench_shape_base.time_block_simple_row_wise
-   59.37μs    29.86μs      0.50  bench_shape_base.time_block_with_1d_arrays_column_wise
-   32.34μs    13.81μs      0.43  bench_shape_base.time_block_with_1d_arrays_row_wise
-   27.94μs     7.95μs      0.28  bench_shape_base.time_no_lists

Edit 2: Updated benchmarks running on bd6729d vs. master:

    before     after       ratio
  [e64699dc] [bd6729d0]
+     4.43s      4.83s      1.09  bench_shape_base.Block.time_3d(100)
-  596.23μs   529.67μs      0.89  bench_shape_base.Block.time_block_complicated(100)
-  151.80μs   116.40μs      0.77  bench_shape_base.Block.time_block_simple_column_wise(100)
-   89.47μs    67.53μs      0.75  bench_shape_base.Block.time_block_simple_row_wise(100)
-   60.31μs    44.77μs      0.74  bench_shape_base.Block.time_no_lists(100)
-  330.40μs   226.72μs      0.69  bench_shape_base.Block.time_nested(100)
-   26.58μs    13.42μs      0.50  bench_shape_base.Block.time_no_lists(10)
-   24.79μs    12.36μs      0.50  bench_shape_base.Block.time_no_lists(1)
-  135.44μs    66.13μs      0.49  bench_shape_base.Block.time_block_complicated(10)
-  935.52μs   400.46μs      0.43  bench_shape_base.Block.time_3d(10)
-   56.96μs    24.34μs      0.43  bench_shape_base.Block.time_block_simple_column_wise(10)
-  126.27μs    52.76μs      0.42  bench_shape_base.Block.time_block_complicated(1)
-   55.89μs    23.25μs      0.42  bench_shape_base.Block.time_block_simple_column_wise(1)
-  172.84μs    70.91μs      0.41  bench_shape_base.Block.time_nested(10)
-   31.93μs    12.83μs      0.40  bench_shape_base.Block.time_block_simple_row_wise(10)
-  153.82μs    61.13μs      0.40  bench_shape_base.Block.time_3d(1)
-   30.98μs    12.06μs      0.39  bench_shape_base.Block.time_block_simple_row_wise(1)
-  182.28μs    68.20μs      0.37  bench_shape_base.Block.time_nested(1)

@charris
Copy link
Member

charris commented Sep 7, 2017

Note changed title, see doc/source/dev/gitwash/development_workflow.rst for the prefixes.

@eric-wieser Ping.

@j-towns j-towns changed the title Simplify block implementation MAINT: Simplify block implementation Sep 7, 2017
@j-towns
Copy link
Contributor Author

j-towns commented Sep 7, 2017

Looks like travis-ci are having some issues currently: https://www.traviscistatus.com/incidents/r0f4m54qp0tr

@eric-wieser
Copy link
Member

I'm not convinced that this is conceptually simpler, but it is definitely more performant.

However, if we want performance, it might be more sensible to avoid the repeated concatenations, and build the result all at once.

I don't know enough about ASV to know if those benchmarks will run correctly, but they don't look to be in the same format as any of our other benchmarks. Can you fix up the benchmarks to use the class-based style, and then maybe submit a separate PR of just the benchmark?

)
return first_index
elif isinstance(arrays, list) and len(arrays) == 0:
return index + [None]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain what's going on here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an empty list is encountered the recursion needs to back up and an error message may need to be generated, if the depths don't match.

I used the length of the index list for the depth info in the error message. On this line I'm making sure that the length of index indeed reflects the depth of nested lists, but using None to flag that this actual index shouldn't be included in the error message. This is parsed at the end of this line. I could have used some other value (such as -1) to flag an empty list.

Copy link
Contributor Author

@j-towns j-towns Sep 10, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example (from the tests), if you do

np.block([1, []])

the error that you get should be

ValueError: List depths are mismatched. First element was at depth 1, but there is an element at depth 2 (arrays[1])

At the end of the message only one index is printed even though the depth is two — that's the kind of situation I'm preparing for in the above line.

@j-towns
Copy link
Contributor Author

j-towns commented Sep 10, 2017

Hey thanks a lot for reviewing.

Can you fix up the benchmarks to use the class-based style, and then maybe submit a separate PR of just the benchmark?

I can certainly do both of those things.

However, if we want performance, it might be more sensible to avoid the repeated concatenations, and build the result all at once.

How exactly would we do that? Assuming we could do that without too much coding effort, I think the performance benefit would be significant only for large-ish arrays. For small arrays I think it we wouldn't get much speedup (I just profiled time_nested to get some idea and in that benchmark only 27% of the time is spent in np.concatenate) edit: just realised that was running on the old implementation of block, on my implementation it seems to be around 20%, I have no idea why it's less, since the same number of calls were made, presumably on the same arrays but anyway we're looking at roughly 20-30%.



def _block(arrays, depth=0):
if isinstance(arrays, list):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the same as type(arrays) is list

Copy link
Contributor Author

@j-towns j-towns Sep 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, will fix that.

list_ndim = list_ndims[0]
arr_ndim = max(arr.ndim for arr in arrs)
ndim = max(list_ndim, arr_ndim)
arrs = [array(a, ndmin=ndim, copy=False, subok=True) for a in arrs]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How big a performance improvement does just using this as the implementation of atleast_nd give?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably little or none, it's just less code. I agree it's nice and clear to have an atleast_nd function though, so maybe doing that with

def atleast_nd(a, ndim):
    # Ensures `a` has at least `ndim` dimensions by prepending
    # ones to `a.shape` as necessary
    return array(a, ndmin=ndim, copy=False, subok=True)

would be a good compromise?

# - more than one way to do things - no point treating tuples like
# lists
# - horribly confusing behaviour that results when tuples are
# treated like ndarray
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for removing this comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I actually hadn't meant to delete that, will re-include it.

@eric-wieser
Copy link
Member

How exactly would we do that? Assuming we could do that without too much coding effort, I think the performance benefit would be significant only for large-ish arrays.

By precalculating the output size, allocating it once, and finishing off #9209. But you're right, this would only be helpful for large arrays

@j-towns
Copy link
Contributor Author

j-towns commented Sep 11, 2017

By precalculating the output size, allocating it once, and finishing off #9209.

OK nice, I'd be happy to add that to this pr once #9209 is merged (which looks like it might happen soonish?). Just to check I'm not confused, you'd still have to call concatenate the same number of times, right? And the better performance would come from only allocating a buffer once.

@eric-wieser
Copy link
Member

If you put the benchmarks in their own PR, I'll merge them right away - that way, the tool running them will have a datapoint from before this PR.

@charris
Copy link
Member

charris commented Sep 12, 2017

I fixed the merge conflict. You may want to pull this down from origin before making anymore commits.

@j-towns j-towns force-pushed the simplify-block-implementation branch from 29ed7b8 to 997ac2c Compare September 18, 2017 12:34
@j-towns
Copy link
Contributor Author

j-towns commented Sep 18, 2017

Have rebased onto master so that I can use the changes in #9209

@j-towns
Copy link
Contributor Author

j-towns commented Sep 18, 2017

Hey @eric-wieser I spent a bit of time thinking today and I don't think setting up the approach you suggested is going to be super straightforward (i.e. probably at least a few hours of work for my slow brain).

Would you be up for merging this pr as is and then if I have time in the future I might have a look at further performance improvements along the lines of what you suggested? Otherwise I will close this for now.

@eric-wieser
Copy link
Member

@j-towns, you don't need to add the out argument handling - I might attempt that myself, and leave this open to compare against.

I'll put a label on this for 1.14 though, so that if I can't produce a faster implementation, your patch makes it in.

@eric-wieser eric-wieser added this to the 1.14.0 release milestone Sep 18, 2017
# yield from ...
for v in self.walk(xi, index + (i,)):
yield v
def _block_check_depths_match(arrays, index=[]):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could do with a comment explaining what it returns, especially since its recursive.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, what I'm looking for is a docstring explaining what the index argument and return values are.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this should be called parent_index?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks again for helpful review comments. Do you reckon the docstring I've written is now sufficient?

for i, arr in enumerate(arrays)]

first_index = indexes[0]
for i, index in enumerate(indexes):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having this overwrite the index parameter is confusing

@eric-wieser eric-wieser changed the base branch from maintenance/1.13.x to master November 1, 2017 16:33
@eric-wieser
Copy link
Member

Something very bad seems to have happened with git - did you rebase or merge there? I changed the base branch twice to clean it up a little, but it still claims you have 41 commits

@eric-wieser
Copy link
Member

eric-wieser commented Nov 1, 2017

Ok, checking out locally, it looks like you rebased, and then merged with the version you had before the rebase (presumably because your push was rejected - you should have force-pushed, instead of pulling), which means there's now two copies of every commit. This commit is a big red flag, which tells us what happened:

Merge branch 'simplify-block-implementation' of https://github.com/j-towns/numpy into simplify-block-implementation

You might want to use git reflog to work out which simplify-block-implementation is which, and then roll back to that.

@j-towns j-towns force-pushed the simplify-block-implementation branch from 2e94872 to a5cbc93 Compare November 3, 2017 11:15
@j-towns
Copy link
Contributor Author

j-towns commented Nov 3, 2017

OK @eric-wieser that should be fixed now, thanks for the help. Very sorry about that 😳.

@charris
Copy link
Member

charris commented Nov 8, 2017

Haven't been following this in detail. What is the current status? @jakirkham Is the missing _Recurser a problem that we should worry about?

@eric-wieser
Copy link
Member

eric-wieser commented Nov 8, 2017

Waiting for a followup on this comment about avoiding the generator comprehension, but overally looks pretty good.

@j-towns
Copy link
Contributor Author

j-towns commented Nov 8, 2017

@eric-wieser in the last few commits I cut the generator expression out. This only had a minor effect on performance. Let me know if there's anything else I should do.

Edit: sorry, to clarify, I cut out the zip(*...) expression. I think I was confusing your comment with an earlier one. I can have a go at cutting the generator expression entirely though I'll be surprised if that has a drastic impact on performance.

@eric-wieser
Copy link
Member

Up to you then. I'd expect removing the generator to make up for the performance loss in trivial cases from removing zip, but what you have right now is pretty readable

for i, arr in enumerate(arrays))

first_index, max_arr_ndim = next(idxs_ndims)
for i, (index, ndim) in enumerate(idxs_ndims, 1):
Copy link
Member

@eric-wieser eric-wieser Nov 9, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This i and enumerate is not needed here

@eric-wieser
Copy link
Member

Get rid of that enumerate that I don't think is being used, and I'm happy to put this in as is.

@eric-wieser
Copy link
Member

eric-wieser commented Nov 9, 2017

@jakirkham: What would atleast_nd look like in dask? There have already been proposals for __array_concatenate__, which would fix concatenate eventually.

@j-towns
Copy link
Contributor Author

j-towns commented Nov 9, 2017

@eric-wieser done.

@charris
Copy link
Member

charris commented Nov 11, 2017

Everyone happy with this now?

@jakirkham
Copy link
Contributor

@jakirkham: What would atleast_nd look like in dask? There have already been proposals for __array_concatenate__, which would fix concatenate eventually.

We normally do things like a[None] to add singleton dimensions, which should also work for NumPy arrays. Here are the implementations for some atleast_*d functions in dask. Would have to look at the spec of atleast_nd to know how to go about this for sure.

Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All looks good to me.

@jakirkham,it looks like _block_check_depths_match should work just fine on dask arrays, in which case it doesn't seem too hard for you to implement a dask version of block. Obviously it's not public API, but nor was _Recurser.

@charris
Copy link
Member

charris commented Nov 12, 2017

Going the put this in. @jakirkham If there are later modifications that would be useful for dask, please make a PR for that so that it can be discussed as a separate topic. If we are going to supply a function it should be public.

@charris charris merged commit 7a3efef into numpy:master Nov 12, 2017
@charris
Copy link
Member

charris commented Nov 12, 2017

Thanks @j-towns .

@eric-wieser
Copy link
Member

Probably should have squashed that when we merged, since it's 23 commits without our commit prefices. Oh well.

eric-wieser added a commit to eric-wieser/numpy that referenced this pull request Nov 13, 2017
This restores the changes in numpygh-9667 that were overwritten.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants