-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Simplify block implementation #9667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note changed title, see @eric-wieser Ping. |
Looks like travis-ci are having some issues currently: https://www.traviscistatus.com/incidents/r0f4m54qp0tr |
I'm not convinced that this is conceptually simpler, but it is definitely more performant. However, if we want performance, it might be more sensible to avoid the repeated concatenations, and build the result all at once. I don't know enough about ASV to know if those benchmarks will run correctly, but they don't look to be in the same format as any of our other benchmarks. Can you fix up the benchmarks to use the class-based style, and then maybe submit a separate PR of just the benchmark? |
numpy/core/shape_base.py
Outdated
) | ||
return first_index | ||
elif isinstance(arrays, list) and len(arrays) == 0: | ||
return index + [None] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain what's going on here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If an empty list is encountered the recursion needs to back up and an error message may need to be generated, if the depths don't match.
I used the length of the index
list for the depth info in the error message. On this line I'm making sure that the length of index indeed reflects the depth of nested lists, but using None
to flag that this actual index shouldn't be included in the error message. This is parsed at the end of this line. I could have used some other value (such as -1) to flag an empty list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example (from the tests), if you do
np.block([1, []])
the error that you get should be
ValueError: List depths are mismatched. First element was at depth 1, but there is an element at depth 2 (arrays[1])
At the end of the message only one index is printed even though the depth is two — that's the kind of situation I'm preparing for in the above line.
Hey thanks a lot for reviewing.
I can certainly do both of those things.
How exactly would we do that? Assuming we could do that without too much coding effort, I think the performance benefit would be significant only for large-ish arrays. For small arrays I think it we wouldn't get much speedup (I just profiled time_nested to get some idea and in that benchmark only 27% of the time is spent in np.concatenate) edit: just realised that was running on the old implementation of block, on my implementation it seems to be around 20%, I have no idea why it's less, since the same number of calls were made, presumably on the same arrays but anyway we're looking at roughly 20-30%. |
numpy/core/shape_base.py
Outdated
|
||
|
||
def _block(arrays, depth=0): | ||
if isinstance(arrays, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the same as type(arrays) is list
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, will fix that.
numpy/core/shape_base.py
Outdated
list_ndim = list_ndims[0] | ||
arr_ndim = max(arr.ndim for arr in arrs) | ||
ndim = max(list_ndim, arr_ndim) | ||
arrs = [array(a, ndmin=ndim, copy=False, subok=True) for a in arrs] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How big a performance improvement does just using this as the implementation of atleast_nd
give?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably little or none, it's just less code. I agree it's nice and clear to have an atleast_nd
function though, so maybe doing that with
def atleast_nd(a, ndim):
# Ensures `a` has at least `ndim` dimensions by prepending
# ones to `a.shape` as necessary
return array(a, ndmin=ndim, copy=False, subok=True)
would be a good compromise?
# - more than one way to do things - no point treating tuples like | ||
# lists | ||
# - horribly confusing behaviour that results when tuples are | ||
# treated like ndarray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for removing this comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I actually hadn't meant to delete that, will re-include it.
By precalculating the output size, allocating it once, and finishing off #9209. But you're right, this would only be helpful for large arrays |
OK nice, I'd be happy to add that to this pr once #9209 is merged (which looks like it might happen soonish?). Just to check I'm not confused, you'd still have to call concatenate the same number of times, right? And the better performance would come from only allocating a buffer once. |
If you put the benchmarks in their own PR, I'll merge them right away - that way, the tool running them will have a datapoint from before this PR. |
I fixed the merge conflict. You may want to pull this down from |
Remove _Recurser class.
array kwargs set to copy=False, subok=True
29ed7b8
to
997ac2c
Compare
Have rebased onto master so that I can use the changes in #9209 |
Hey @eric-wieser I spent a bit of time thinking today and I don't think setting up the approach you suggested is going to be super straightforward (i.e. probably at least a few hours of work for my slow brain). Would you be up for merging this pr as is and then if I have time in the future I might have a look at further performance improvements along the lines of what you suggested? Otherwise I will close this for now. |
@j-towns, you don't need to add the I'll put a label on this for 1.14 though, so that if I can't produce a faster implementation, your patch makes it in. |
numpy/core/shape_base.py
Outdated
# yield from ... | ||
for v in self.walk(xi, index + (i,)): | ||
yield v | ||
def _block_check_depths_match(arrays, index=[]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could do with a comment explaining what it returns, especially since its recursive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be clear, what I'm looking for is a docstring explaining what the index
argument and return values are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps this should be called parent_index
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks again for helpful review comments. Do you reckon the docstring I've written is now sufficient?
numpy/core/shape_base.py
Outdated
for i, arr in enumerate(arrays)] | ||
|
||
first_index = indexes[0] | ||
for i, index in enumerate(indexes): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this overwrite the index
parameter is confusing
Something very bad seems to have happened with git - did you rebase or merge there? I changed the base branch twice to clean it up a little, but it still claims you have 41 commits |
Ok, checking out locally, it looks like you rebased, and then merged with the version you had before the rebase (presumably because your push was rejected - you should have force-pushed, instead of pulling), which means there's now two copies of every commit. This commit is a big red flag, which tells us what happened:
You might want to use |
2e94872
to
a5cbc93
Compare
OK @eric-wieser that should be fixed now, thanks for the help. Very sorry about that 😳. |
Haven't been following this in detail. What is the current status? @jakirkham Is the missing |
Waiting for a followup on this comment about avoiding the generator comprehension, but overally looks pretty good. |
@eric-wieser in the last few commits I cut the generator expression out. This only had a minor effect on performance. Let me know if there's anything else I should do. Edit: sorry, to clarify, I cut out the zip(*...) expression. I think I was confusing your comment with an earlier one. I can have a go at cutting the generator expression entirely though I'll be surprised if that has a drastic impact on performance. |
Up to you then. I'd expect removing the generator to make up for the performance loss in trivial cases from removing |
numpy/core/shape_base.py
Outdated
for i, arr in enumerate(arrays)) | ||
|
||
first_index, max_arr_ndim = next(idxs_ndims) | ||
for i, (index, ndim) in enumerate(idxs_ndims, 1): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This i
and enumerate
is not needed here
Get rid of that |
@jakirkham: What would |
@eric-wieser done. |
Everyone happy with this now? |
We normally do things like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All looks good to me.
@jakirkham,it looks like _block_check_depths_match
should work just fine on dask arrays, in which case it doesn't seem too hard for you to implement a dask
version of block
. Obviously it's not public API, but nor was _Recurser
.
Going the put this in. @jakirkham If there are later modifications that would be useful for dask, please make a PR for that so that it can be discussed as a separate topic. If we are going to supply a function it should be public. |
Thanks @j-towns . |
Probably should have squashed that when we merged, since it's 23 commits without our commit prefices. Oh well. |
This restores the changes in numpygh-9667 that were overwritten.
I've simplified the implementation of numpy.block, without changing its API. I thought the implementation was pretty bloated, particularly with the _Recurser class, which IMO was obscuring the logic of the function. This is a subjective judgement of course.
A happy side effect of this change is that performance of block has improved. I've implemented some benchmarks (based on the test cases), and running
on my laptop I get
This is my first pr submitted to numpy, apologies in advance if there's anything I need to do that I've missed!
Edit: bf616bf further improved performance by not copying input arrays. Now:
Edit 2: Updated benchmarks running on bd6729d vs. master: