-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add a fast path for NumPy arrays to Collection.set_verts #16689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a fast path for NumPy arrays to Collection.set_verts #16689
Conversation
This will conflict with https://github.com/matplotlib/matplotlib/pull/16617/files#diff-506c6bd01694bddbd8999f2c6e920705. Feel free to pick up the relevant patch in my PR, or I can rebase mine on top of yours, whatever works for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test that both code paths (ndarray and non-ndarray) result in the same self._paths
?
@anntzer mine is a very local change, so I'm ok with waiting until yours lands. |
Either way is fine, but I think you should check whether manually constructing the codes array is significantly faster than using |
@anntzer I've changed the slow path to use |
I guess that could actually be a useful general optimization in path.py? e.g. preallocate the codes for closed paths of up to, say, 16 vertices (I picked 16 because that's the same threshold for caching as in unit_regular_polygon/unit_regular_star). (In which case perhaps implement that optimization first and see whether you can exploit it here.) |
@anntzer Good point. I'll experiment with that once this lands. |
It would be great if this could get another stamp and land. I've done some more work and it turns out that path creation can be optimized even further! After this PR it takes ~1.4s to create all the paths, but a ~20 LOC change allows to bring this down to ~0.19s. A more hacky version gets this even lower (to ~0.1s), but that is a bit too much for my taste (it requires inlining some function bodies — calls are expensive in Python). With the PRs I've submitted so far (including this one), the rendering for my example (400x400 surface plot on a 2012 MBP) is ~2s on average. With the changes I have stacked on top of this it drops to ~0.7s. The breakdown of this 0.7s is:
I've also done a bit more digging and I think it would be feasible to refactor |
cfcc6a0
to
9fa83c3
Compare
Anyone can merge after CI passes (ignoring circleci, which shouldn't be affected by this and is getting fixed (hopefully) in #16784.) |
Not sure what causes the |
that just looks like flakiness. do you want to squash the last commit? either way is fine. |
This reduces the run time for larger 3D plots by over a second on some toy examples.
9fa83c3
to
95977d6
Compare
All commits are now squashed. |
thanks :) |
@anntzer Thanks for all the reviews and help with getting this code in! |
You're welcome; your PRs are great 😁 |
PR Summary
This reduces the run time for larger 3D plots by over a second on a
toy examples linked in #16688. In total #16675, #16688 and this PR lower the run time for that example from ~16s to ~3.3s (4.8x speedup).
PR Checklist