-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
MAINT: Ediff1d performance #8183
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I reviewed this over in mattharrigan#1. Looks good to me!
@mattharrigan Can you rename the second commit to mention |
I haven't ever done that before. Do I commit --amend and then push --force like this? |
That should work. |
1bb4329
to
4848271
Compare
Thanks! |
Yes, that sounds right. On Fri, Oct 21, 2016 at 10:21 AM, mattharrigan notifications@github.com
|
This causes a regression is astropy, since with the creation of an empty array, subclasses are no longer propagated correctly. See https://travis-ci.org/astropy/astropy/jobs/170586124#L2564. |
@mhvk Remind me, do we have a standard way of allocating results with the proper subclass? |
I'll defer to the experts, but this looks promising |
result = np.empty(l + len(to_begin) + len(to_end), dtype=ary.dtype) | ||
result[:len(to_begin)] = to_begin | ||
result[len(to_begin) + l:] = to_end | ||
np.subtract(ary[1:], ary[:-1], result[len(to_begin):len(to_begin) + l]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surely this is wrong? It always writes only one element of the output!
@shoyer look at functions like |
Looking at the new code, I don't think it is right for input arrays longer than 2 elements... Revert this for now?! As for To get this right is not totally trivial, as, at least for Alternatively, after calculating the difference, one could still create an empty array, and either call, as @mattharrigan suggested, Note, though, that at least for |
One thing about wrapping: I have no idea how far numpy_ufunc is and how it plays into it. |
@mhvk if you think you will have the fix in quickly, don't know if a revert is vital (if this is wrong), but in that case mark a followup PR as a release blocker (or create a release blocking issue). |
@mhvk I don't understand the concern about not being right for arrays longer than 2. Are you confusing the lower case L with the number 1? If so I apologize for not using better variable names. I didn't know much at all about ndarray subclasses or how much complexity might creep into this PR due to it. Most of the inefficiency of the previous implementation was due to the flat iterator instead of just calling ravel and doing the subtraction with an array. Perhaps that change alone should be made. But you point out that Quantities are not handled well in either case |
Ah, yes, that was an |
I think it would be good at least to restore the previous behaviour (and thus remove the regression): without |
In either case the output array must be created. a[1:] = a[:-1] just does that behind the scenes. This code does the subtraction with the output array given so the output array is not created twice |
It it true that an output array has to be created in either case, but the speed is not the same:
It gets worse for smaller inputs:
|
Good point. There should be something to the effect of:
|
My intuition about performance was wrong (big surprise), but I would like to understand why. Here are three timeit calls. The second is the fastest as I expected. However creating that empty array takes forever relatively speaking, so that step 2 + step 3 takes far longer than 1, even though step one internally has to create an array also. Admittedly these are very small arrays. Is the difference just due to overhead? Is there hidden work somewhere? Thank you
|
There is plenty of hidden work.... At the things you are timing, simply the argument (and keyword arguments) parsing is probably a significant portion of things. |
The speed difference here (~2 us) is pretty much just unavoidable Python overhead. Anything less than 5-10 us not really worth optimizing -- anyone who cares about performance at this level should not be using Python in their inner loop. We can certainly switch to use simple subtraction for preserving subclasses in this case, but it is very unfortunate how all these implementation details have leaked to become public API for handling subclasses. This is not the only case where NumPy has been locked inadvertently into strange/inefficient implementations for the sake of subclasses. In any case I'm glad you are testing the behavior you are relying on! Our internal test coverage for subclasses is pretty sparse, and it's difficult to imagine covering all cases without expanding the test suite in a major way (and also without adding appropriate APIs like |
I think there are several options, not sure which is preferred
|
I'd like to branch 1.12 soonish so it would be good to get this fixed up. |
@shoyer - it is not just subclasses outside of numpy -- Unfortunately, for a general subclass, for wrapping to work, one does have to calculate the difference, since it may have properties that are required for the wrapping and cannot simply be inferred from the input array (the same would hold for a hypothetical |
@mattharrigan I am happy with either (1) or (2). You are also welcome to give (3) a shot, but that would be significantly more involved. We do need to unbreak astropy, but we don't need to make this work in cases that didn't work before. @mhvk I recognize that these issues are true for I do find this feature creep around subclasses quite frustrating. It puts us, as NumPy developers, in an untenable position: we are locked into supporting features we neither intentionally added nor even know we have! If something happens to work properly for subclasses through fortuitous implementation details, then we are stuck supporting it forever. This is particularly problematic for subclasses outside the NumPy code base, because they aren't tied in to our CI systems. Maybe this a good time to revise the documentation to officially discourage subclassing by third parties. |
I too am frustrated. The original point of this PR was performance, and option 1 still provides most of the performance benefit, is clean, and shares the same subclass peculiarities as the previous implementation. Does this PR get reopened or should I start another PR? |
@shoyer - I guess the annoyances are real, but it may be good to keep in mind the benefits as well: it would seem to me that numpy in general, and On the PR proper, I do think that given that there was a general hope expressed to have |
I could make a strong argument that the proposed implementation plus calling wrap is better specifically for astropy units. Using the previous ediff1d, these two calls produce identical output:
The first should error out and the second should keep the unit. Both end up being wrong. hstack, or concatenate in general, doesn't check for compatible units or keep the unit. However this errors out as it should:
Which is what proposed implementation plus wrap would do. That's also what diff will do, eventually I hope. The proposed would also work for this, while the previous ediff1d chokes:
|
@mattharrigan - indeed, if I were to write a version of concatenate that would work for quantity, it would effectively have to do just what you describe! However, your current code will also give a wrong result, since your Anyway, just to be clear, I think anything that makes things faster for regular arrays and does not cause a regression is a clear benefit, i.e., either do your 1 or 2 (and perhaps add a test with |
Please open a new PR. |
I haven't used masked array or astropy units previously but i have used np.matrix, so that's what I looked at first. I planned to copy examples from np.linalg which return a matrix or array depending on the input using wrap. It works nicely with the matrix subclass, see below. I would do that immediately after creating the empty array.
However it doesn't seem to work with astropy.Quantity. Are the various methods implemented correctly? I am not familiar with that code at all but I suspect it might be an issue. |
@mattharrigan - what I meant was that
I should add that now that I look at this again, this form of wrapping will at least make everything work for regular |
p.s. Sorry I'm replying as if your quote was to a comment I gave on your |
Eliminate a copy operation when to_begin or to_end is given. Also use ravel instead of flatiter which is much faster.
Closes #8175
Benchmark:
python -m timeit --setup="import numpy as np;x=np.arange(1e7)" "np.ediff1d(x, 0)"
new version is about 5x faster on my machine