-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Make Collection.set_paths even faster #16793
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/matplotlib/path.py
Outdated
@@ -74,8 +74,12 @@ class Path: | |||
made up front in the constructor that will not change when the | |||
data changes. | |||
""" | |||
__slots__ = ('_vertices', '_codes', '_readonly', | |||
'_should_simplify', '_simplify_threshold', | |||
'_interpolation_steps') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding slots significantly speeds up the time it takes to create an instance!
lib/matplotlib/path.py
Outdated
""" | ||
pth = cls.__new__(cls) | ||
pth._vertices = _to_unmasked_float_array(verts) | ||
if unmask_verts: | ||
pth._vertices = _to_unmasked_float_array(verts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apparently this function is not as fast due to an extra call overhead, and because _to_unmasked_float_array
calls np.asarray
, which is pretty expensive even when you pass in an array!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See numpy/numpy#13580 :) how much do we get if we just add in _to_unmasked_float_array
a check to return things as if the arg is already an array?
Also, I think the warrants a comment explaining that this is for perf. reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that the extra call adds a fair amount of overhead itself, and a robust check would not be cheap either. I can try and circle back with the results if you want.
example_path = mpath.Path(verts_pad[0], codes) | ||
# Looking up the function once before the iteration gives a nice speedup | ||
_make_path = mpath.Path._fast_from_codes_and_verts | ||
self._paths = [_make_path(xy, codes, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out that the Python function call overheads become quite significant here, and so inlining the body of _fast_from_codes_and_verts
into this function actually results in an end-to-end 10% improvement for my example plot (this function along becomes almost 2x faster)! On the other hand, it's a method that touches all the Path internals, so I would feel a bit weird inlining it here...
This appears to have broken some things. |
@QuLogic Whoops, I ran way too small of a test subset before I've pushed this. Should be good now! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are you using as a benchmark?
lib/matplotlib/path.py
Outdated
@@ -160,7 +164,8 @@ def __init__(self, vertices, codes=None, _interpolation_steps=1, | |||
self._readonly = False | |||
|
|||
@classmethod | |||
def _fast_from_codes_and_verts(cls, verts, codes, internals_from=None): | |||
def _fast_from_codes_and_verts(cls, verts, codes, internals_from=None, | |||
unmask_verts=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a private function; do we even need the flag? Maybe a check for masked arrays is needed, but I'm pretty sure all internal callers already provide arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@QuLogic The same snippet as in all the other PRs:
import io
import time
import numpy as np
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
from contextlib import contextmanager
from collections import defaultdict
results = defaultdict(lambda: np.zeros((2,)))
@contextmanager
def measure(name):
s = time.perf_counter()
yield
d = time.perf_counter() - s
results[name] += [1, d]
def summarize():
for name, (count, total) in sorted(results.items()):
print(name, total / count)
x = y = np.linspace(-1, 1, 400)
x, y = np.meshgrid(x, y)
z = x ** 2 + y ** 2
for i in range(4):
fig = plt.figure()
ax = fig.gca(projection='3d')
with measure('plot'):
ax.plot_surface(x, y, z, rstride=1, cstride=1)
with measure('draw'):
buf = io.BytesIO()
fig.savefig(buf, format='raw')
plt.close()
summarize()
@QuLogic This should have addressed all your comments. |
codes = example_path.codes | ||
self._paths = [_make_path(xy, codes, | ||
internals_from=example_path, | ||
unmask_verts=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unmask_verts
is no longer there.
@@ -175,7 +179,7 @@ def _fast_from_codes_and_verts(cls, verts, codes, internals_from=None): | |||
never copied, and always set to ``False`` by this constructor. | |||
""" | |||
pth = cls.__new__(cls) | |||
pth._vertices = _to_unmasked_float_array(verts) | |||
pth._vertices = verts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a note to the docstring that verts
should be unmasked, and not just any NumPy array?
You have ~5 days to rebase before 3.3 release candidate comes out. |
Well, apparently not 5 days, but you still have a chance here. |
Moved to 3.4 |
ping @apaszke still interested in this optimization? |
Coming back to this, I am 👍🏻 on the |
PR Summary
As in the title, this PR optimizes
Collection.set_paths
even further. With all PRs I've submitted before this one applied, the rendering for my example (400x400 surface plot on a 2012 MBP) takes ~2s on average. With this patch, the time drops to ~0.7s. The breakdown of this 0.7s is:Poly3DCollection.do_3d_projection
Collection.set_paths
Collection.draw
(mostly spend in the C code for the Agg backend in this case)PR Checklist