-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
ENH: use __skip_array_function__ internally in NumPy #13585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This should speed up some functions consideraly. Note: I'm also intentionally using this even (especially!) in functions that do not directly call np.asarray() on their inputs. This is so that overrides of __array_function__ cannot effect the internals of NumPy functions, which would be leaking our implementation as our API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, went through all of them. I am wondering a bit if helpers such as flatnonzero
actually should be dispatched, since they are super shallow wrappers around dispatched functions. But I guess the promise is to dispatch everything? And I did not follow this all that closely anyway until now.
(mostly 1-2 possible squirms, but since it is easy to miss, marking as "request changes").
@@ -194,7 +194,7 @@ def _broadcast_shape(*args): | |||
# ironically, np.broadcast does not properly handle np.broadcast | |||
# objects (it treats them as scalars) | |||
# use broadcasting to avoid allocating the full array | |||
b = broadcast_to(0, b.shape) | |||
b = broadcast_to.__skip_array_function__(0, b.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All good, except this one (maybe!). This gets called from inside the np.vectorize
class, which is not hidden away behind __array_function__
. So I think this may be wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vectorize
always coerces with asarray()
or asanyarray()
, so I think this is fine, as a (minor) optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these optimizations really worth making?
@@ -241,7 +241,7 @@ def transpose(a): | |||
------- | |||
aT : (...,N,M) ndarray | |||
""" | |||
return swapaxes(a, -1, -2) | |||
return swapaxes.__skip_array_function__(a, -1, -2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transpose here is not actually decorated as a skipping function, because it is linalg only. So I am really unsure that this is correct. (very likely my ignorance though).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is right, because this is always used inside other linalg functions, which are all wrapped with __array_function__
and also cast their inputs into NumPy arrays (see the _make_array
utility function). And even if we had some functions that didn't do this, we don't want to expose the fact that they happen to use swapaxes
internally. That would be leaking an implementation detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, this is whole thing is new anyway. The function is just publicly exposed, so that means if someone actually uses it, it will fail on some array likes (and it would not before).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is transpose
really publicly exposed? I guess as np.linalg.linalg.transpose
, but it isn't documented anywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But to be clear, if somebody used it on array-likes before, this function would indeed really work the same. There hasn't been a release of NumPy that enables __array_function__
yet.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
linalg.linalg
isn't public
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, sorry, I had missed the second linalg there somehow...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't want to expose the fact that they happen to use swapaxes internally. That would be leaking an implementation detail.
Calling it an inplementation seems like a stretch to me. The 2d transpose is essentially defined as swapping the last two axes. If linalg.transpose were public, then I'd argue that it should remain unchanged.
As written here, it seems like a questionable but harmless micro-optimization.
This all looks correct to me, thanks Stephan. Taking a step back, this PR demonstrates an issue with the way the protocol works. This makes all NumPy code that calls another function wrapped with the It's probably also easy to forget to add |
It does look like the beginnings of a slippery slope. The danger of the protocol is that it might encourage excessive tinkering down the line. What seems to be the immediate problem is keeping NumPy at the top of the user heap rather then in the scrum with everyone else. |
I agree, I don't love it. But on the plus side, it's still way more readable than defining functions in C :)
To be clear: this change is already probably more pervasive than strictly necessary. As long as we cast inputs with Perhaps I try to roll back the extent of the changes here? I'm guessing that many of these don't actually make any difference, beyond slightly obfuscating code. As for automatic testing -- yes, this would be a good idea, though static analysis to find "unsafe" uses of unwrapped functions seems non-trivial. Perhaps a tool like mypy could be leveraged? The tricky part is recognizing cases where a function like |
It should be easy to benchmark the effect of these changes |
Unfortunately ASV gives pretty noisy results, at least when I run it on my laptop. Here are all the benchmarks that change by at least 10%:
The performance difference for |
Sorry to be late to this (I was on holidays), but I must admit I don't like this much at all! It makes the functions much less readable, and I feel it goes in the wrong direction from a duck-typing perspective: this seems analogous to rewriting every function that does As an admittedly selfish example, I've just started implementing With this PR, such partial overrides will no longer work, thus forcing users to copy & paste implementations like those of My sense remains that we should aim to get to a state where functions can assume that their arguments behave like |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Explicitly marking this as requesting changes, so we don't just merge based on the one approval while discussion is ongoing.
Specific request is to only do this (if at all) for functions where arguments are coerced with as*array
.
This is something we discussed, but it wasn't a goal of NEP-18: We can definitely try to figure out how to do this in the future, but it should be done intentionally and with care, because it exposes NumPy's internal implementations as public APIs. This would be a major increase in our API surface, and would limit our ability to make future changes. Consider the internal implementation of numpy/numpy/core/shape_base.py Lines 827 to 830 in f9c1502
However, note that the This PR maintains a simple policy: if you want to overload a NumPy function, you have to override it directly. This isn't the most convenient API for implementers, but it keeps our options open as NumPy maintainers. |
I guess I must have lost track during the long discussion, but I thought the main purpose of It doesn't seem to me that the relatively miminal speed-ups outweight either the cost of above or the cost in readability (let alone both). p.s. The above does not mean I disagree that we should be free to change the implementation for |
Perhaps that's a fair assessment.
I'd more put it like: you are supplying your own implementation, hence changes in NumPy's internals are irrelevant (except if they're backwards-incompatible changes in API, to which you adapt).
On second thought, I'm not sure this is right. Or I need a more detailed explanation of what you mean by this. Taking a random example:
If |
I should have been more explicit: if you use
Better answered by @shoyer, of course, but the reason I would like this not to happen is that if I correctly implement
rather than use |
This sounds a bit like @njsmith's ducktype proposal, a minimal list of functions that need to be implemented. |
Yes, I liked that proposal very much and still hope that |
This version is fine. The version where I'm concerned about leaking implementation as API looks like this:
Consider a library that implements |
This is a reasonable thing to want, but it was not what I was aiming for with Let me also surface one of my previous mailing list posts on this topic:
(I added "[existing]" for clarification.) So sure, let's make
I am very sympathetic here, but I think this exposes a typical issue: a duck array class could implement We could say that we make no backwards compatibility guarantees on what goes on inside |
You can also see this as an argument for not doing the whole Wouldn't we be able to implement this inside the dispatch mechanism instead? Something like
I suspect that that's doable with not too much code (and hopefully similar performance gain as this PR), and you get the skip method you want without needing the changes in this PR. |
@shoyer - it may be best to go back to a specific example rather than to try to generalize, as it may clarify where we differ. Let's pick Currently, the I think we should encourage people who use p.s. We will be stuck with our present implementations for a very long time anyway, just for classes that do not define |
I don't really agree that this is a reasonable thing to want. You never want to agree to be sensitive to internal implementation details, what good does that ever do you? It's just a concession with zero upsides. What you signal is "ignore my |
p.s. Sorry, |
Now that you mention it, this does seem like it may be possible. I'm going to look into this...
Yes, this is true with the current implementation of
I agree, so let me restate this: I do think there are cases where it convenient to be able to use a canonical implementation of a NumPy function terms of a fixed set of core operations. This may or may not correspond to our current implementations in NumPy. |
Examples would indeed be helpful here. At present, I don't think there any many NumPy functions that expose their internal implementation details for duck arrays. Some (inconsistently) do so for ndarray subclasses, but almost everything using |
I agree with this too. I simply fear that by making it impossible to use even current implementations that are close to that ideal, we make it unnecessarily harder to reach it. I still see |
I don't think that's an accurate analogy. This PR is simply a (admittedly not pretty) way to preserve the current default 1.16 behaviour. It doesn't make duck typing easier, but it also should not make it harder. This whole feature is basically unrelated to duck-typing imho.
Yes, that would be nice. It's not something that we guarantee at all today though (AFAIK - if we do somewhere, can you point to it?). It requires, even for implementations that now happen to be just about right, tests and documentation. |
Just going down the list, the first and really obvious one is The code in But |
In what way does it preserve 1.16 behaviour? If one doesn't define
It does, it that it removes control: instead of "treat me as if I were |
Maybe I should say "give the option to keep 1.16 behaviour, even in the presence of
|
I agree we can change it any time. My main point is that there is no reason to change it now, when really, unlike so much of numpy, we have a nice pythonic implementation. This means that any class that defines
For |
No it's not. At least I would not state that with such certainty. Because now you have just agreed to making our internal implementation details public. That's a major decision that negatively impacts the maintainability of NumPy itself. It may be convenient for some other library, but there's a big tradeoff here. There's probably way more work in us writing tests and docs if we do make that decision than there is in copying some code around. |
@rgommers - If we want to preserve 1.16 behaviour like that, I think we would be much better off with a context manager that does the opposite of the current environment variable. Indeed, that would have the advantage of keeping the actual code readable. Indeed, I would not mind if such a "context manager" were used inside |
No need. The preserving is not a goal. I just tried to explain that that's what it does in practice, so it should not affect duck arrays one way or the other. Situation for those does not get better, does not get worse. I think the fundamental issue here is that you want Anyway, maybe we should give this a rest till @shoyer had a chance to try the |
me too:) |
In what way is this making it more public than it is already? I guess we may have to agree to disagree here on what constitutes a larger cost, but I think you may be underestimating the cost of copy & paste coding. |
I'm not assuming or underestimating anything. I'm just stating that it's a tradeoff, and one with big implications. I don't see anyone lining up to write all those tests and docs for us. Or changing implementations for those functions that aren't ideal in implementation right now. This needs a NEP at the very least, and it's not trivial.
It's not public today. You just agreed we can change it if we'd want. So if we can't change it anymore later, then we just made it more public. |
I only meant that I don't see how the extent we can change it depends on what we do for Similarly, it should be obvious that I don't think we have to write tests for other packages. Though I do think we'd have quite a good set if we were to use |
p.s. And, yes, let's give this a rest to see if the context manager thingy works - code readability alone makes this PR very non-ideal. |
p.p.s. I do wonder whether we shouldn't revert |
That thought has also occurred to me. All these things need implementation testing and feedback as we feel our way into what could easily become a mess. @rgommers I believe you also have some recent experience with using |
Perhaps we should have a synchronous discussion, if not during the community call then a specially scheduled one to work this out. Trying to follow the discussion on github is painful. |
Having thought about this a bit more, I think a context manager would be very challenging:
This basically means we would need to use something like contextvars, but that's only available in Python 3.7. We could potentially do a back-port, but even then the performance hit might be unacceptable -- the contextvars will add overhead at least as large as a dictionary lookup. What I do think would make sense is a enabling a run-time check for wrapped invocations of |
Accessing thread locals is fast, especially if you use modern compiler features like My concern about a context manager would be spooky action at a distance. If I do |
@njsmith the idea was a private context manager, inside the |
Yes, this would only be for cases where we're using NumPy's implementation of functions. As long as these functions don't have some API that allows for running arbitrary code, we would be in the clear. That said, I agree with @njsmith about disliking action at a distance, even if we think it's safe. For that reason alone I would prefer explicitly calls to For what it's worth, I think this is a pretty minor loss in readability. It's not pretty, but a 10% speed-up for the price of adding one magic attribute look-up seems like a pretty good deal to me. We already do lots of stuff like this, e.g., using |
I guess people differ in the extent to which they adhere to "readability counts" - though if I start quoting the zen of python, I should be fair and agree that it does suggest that the context manager is not that great an idea! But if speed is of the essence, then perhaps an alternative ifor 1.17 s to keep the environment variable that turns |
This doesn't make much sense to me. This would be much less hard to understand than the rest of the dispatcher implementation. It has nothing to do with action at a distance. It's about keeping something that's a property of the whole I also really don't care whether it's actually a context manager or something else. If we could do in a thread-safe manner |
Opinions on that vary I guess. Another thing to keep in mind is that you were in favor of using |
One call about the design may not be enough, but for talking about the proposal @mhvk just made on the mailing list, it could be useful. Scheduling this may not be easy though ..... |
Keep in mind that none of this is a speedup compared to a baseline of "no What we gain with I'm just running the SciPy test suite with the env var on and off to get a sense of the impact. It would be useful to have a bigger comparison for other libraries, and some real-world code bases (test suites may slow down more than real-world code). |
To follow up on this: I made an attempt to detect nested use of Unfortunately, the result is not very satisfying. The best I can do is check whether we call another overrided function within a NumPy implementation, but the end result of this would be using If we want to detect this programmatically, we would need either:
As is, I'm not really comfortable with any solution here. Going forward with |
Thanks for trying @shoyer. That's too bad. |
Closing this based on the consensus from the mailing list. See #13624 fo a PR revising the NEP. |
This is a partial reprise of the optimizations from numpyGH-13585. The trade-offs here are about readability, performance and whether these functions automatically work on ndarray subclasses. You'll have to judge the readability costs for yourself, but I think this is pretty reasonable. Here are the performance numbers for three relevant functions with the following IPython script: import numpy as np x = np.array([1]) xs = [x, x, x] for func in [np.stack, np.vstack, np.block]: %timeit func(xs) | Function | Master | This PR | | --- | --- | --- | | `stack` | 6.36 µs ± 175 ns | 6 µs ± 174 ns | | `vstack` | 7.18 µs ± 186 ns | 5.43 µs ± 125 ns | | `block` | 15.1 µs ± 141 ns | 11.3 µs ± 104 ns | The performance benefit for `stack` is somewhat marginal (perhaps it should be dropped), but it's much more meaningful inside `vstack`/`hstack` and `block`, because these functions call other dispatched functions within a loop. For automatically working on ndarray subclasses, the main concern would be that by skipping dispatch with `concatenate`, subclasses that define `concatenate` won't automatically get implementations for `*stack` functions. (But I don't think we should consider ourselves obligated to guarantee these implementation details, as I write in numpyGH-13633.) `block` also will not get an automatic implementation, but given that `block` uses two different code paths depending on argument values, this is probably a good thing, because there's no way the code path not involving `concatenate` could automatically work (it uses `empty()`).
* MAINT: avoid nested dispatch in numpy.core.shape_base This is a partial reprise of the optimizations from GH-13585. The trade-offs here are about readability, performance and whether these functions automatically work on ndarray subclasses. You'll have to judge the readability costs for yourself, but I think this is pretty reasonable. Here are the performance numbers for three relevant functions with the following IPython script: import numpy as np x = np.array([1]) xs = [x, x, x] for func in [np.stack, np.vstack, np.block]: %timeit func(xs) | Function | Master | This PR | | --- | --- | --- | | `stack` | 6.36 µs ± 175 ns | 6 µs ± 174 ns | | `vstack` | 7.18 µs ± 186 ns | 5.43 µs ± 125 ns | | `block` | 15.1 µs ± 141 ns | 11.3 µs ± 104 ns | The performance benefit for `stack` is somewhat marginal (perhaps it should be dropped), but it's much more meaningful inside `vstack`/`hstack` and `block`, because these functions call other dispatched functions within a loop. For automatically working on ndarray subclasses, the main concern would be that by skipping dispatch with `concatenate`, subclasses that define `concatenate` won't automatically get implementations for `*stack` functions. (But I don't think we should consider ourselves obligated to guarantee these implementation details, as I write in GH-13633.) `block` also will not get an automatic implementation, but given that `block` uses two different code paths depending on argument values, this is probably a good thing, because there's no way the code path not involving `concatenate` could automatically work (it uses `empty()`). * MAINT: only remove internal use in np.block * MAINT: fixup comment on np.block optimization
This should speed up some functions considerably.
Note: I'm also intentionally using this even (especially!) in functions that do not directly call np.asarray() on their inputs. This is so that overrides of array_function cannot effect the internals of NumPy functions, which would be leaking our implementation as our API.