-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Optimise Axes.add_patch
to avoid python iteration over bezier segments
#28657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening your first PR into Matplotlib!
If you have not heard from us in a week or so, please leave a new comment below and that should bring it to our attention. Most of our reviewers are volunteers and sometimes things fall through the cracks.
You can also join us on gitter for real-time discussion.
For details on testing, writing docs, and our review process, please see the developer guide
We strive to be a welcoming and open project. Please follow our Code of Conduct.
Unfortunately this doesn't pass many tests so I suspect your optimization is returning very different results than the usual code. It may be helpful if you make a test example that is very slow for you so we can understand clearly what you are trying to achieve and then folks may have suggestions. |
Indeed, I saw that! I'd like to be able to see what the differences are locally; but it seems that I can't even run one by passing a specific path to My suspicion is that the difference might be to do with a simple consideration of control points vs extremal values on the curve; but very hard to say without being able to see the output. |
Here is a silly example, but it's minimal whilst demonstrating the issue: import timeit
import numpy
from matplotlib import pyplot
from matplotlib.axes import Axes
from matplotlib.collections import PatchCollection
from matplotlib.patches import Polygon
points = numpy.random.rand(10000, 2)
patch = Polygon(points)
_, axes1 = pyplot.subplots()
assert isinstance(axes1, Axes)
t1 = timeit.timeit(lambda: axes1.add_patch(patch), number=1)
_, axes2 = pyplot.subplots()
assert isinstance(axes2, Axes)
t2 = timeit.timeit(
lambda: axes2.add_collection(PatchCollection([patch], match_original=True)),
number=1,
)
print(t1)
print(t2)
The difference in this case is nearly a factor of 200x on my machine. In a more realistic case where also considering figure construction then it's less significant, but still noticeable. Running the first one through the profiler shows the time being spent in
In our real use we're somewhat performance-sensitive when drawing fairly complex patches. So removing this bottleneck by pushing the iteration down to C++ would be helpful. My hope was that adding a |
The actions have an "Upload Artifact " step with a link to the artifacts. No idea why your pytest is not working. I would make sure it is running from the same python as your env. |
Finally got a python stack trace after interrupting |
You can use blame to see that this code came in relatively recently: #19124 and #24177. At the least if you remove this code I'd expect Of course you also seem to have removed extent finding all together as some of the changes are not trivial in the rest of the test suite. |
My underlying assumption here was that the limit adjustment for a collection of one patch would be the same as for that one patch alone. I think my understanding from these failures (and a few inferences about the code) is that this assumption is false, although without the ability to iterate on the tests locally I'm not 100% sure. Anyway, I'll close this PR. I'm not advocating for changing behaviour; it just appeared at a glance that this might be behaviour preserving, but never mind! |
If you have very large paths with lots of bezier curves, it might be worth turning off autoscaling and handling setting the limits your self. |
Thanks for the suggestion. Unfortunately calling |
hmm, looking at the code it looks like we always compute the data limits and only sometimes update the view limits. I can see a case for making "update the data limits" lazy and computing it on the demand, but I do not think that would be a small project to do. |
Sorry for pointing you down a dead-end. |
PR summary
This aims to optimise calls to
_AxesBase.add_patch
, where the patch might have a path with many bezier segments. This is done by avoiding a long loop in python, and instead delegating toBBox.update_from_path
, which is used when updating limits from other artists.This provides a ~5x speedup for a call to
_AxesBase.add_patch
in some of our internal code, with complicated patches with hundreds of bezier segments.Implementation notes
The implementation is effectively the same as that in
_AxesBase._update_line_limits
also in this class. It performs the iteration in C++ (in_path.h::update_path_extents
).This is the same approach as used for collections of patches; i.e. the rescaling behaviour is now as performant as:
PR checklist
WIP: trying but so far have failed to get the test suite running locally. I followed steps to create necessary venv and build, but running
pytest
just hangs indefinitely without consuming any resources. I can however import the local build of matplotlib and perform basic tasks with it by hand.