Skip to content

Make Collection.set_paths even faster #16793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

apaszke
Copy link
Contributor

@apaszke apaszke commented Mar 16, 2020

PR Summary

As in the title, this PR optimizes Collection.set_paths even further. With all PRs I've submitted before this one applied, the rendering for my example (400x400 surface plot on a 2012 MBP) takes ~2s on average. With this patch, the time drops to ~0.7s. The breakdown of this 0.7s is:

  • ~0.4s in Poly3DCollection.do_3d_projection
    • Including ~0.2s in Collection.set_paths
  • ~0.3s in Collection.draw (mostly spend in the C code for the Agg backend in this case)

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@@ -74,8 +74,12 @@ class Path:
made up front in the constructor that will not change when the
data changes.
"""
__slots__ = ('_vertices', '_codes', '_readonly',
'_should_simplify', '_simplify_threshold',
'_interpolation_steps')
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding slots significantly speeds up the time it takes to create an instance!

"""
pth = cls.__new__(cls)
pth._vertices = _to_unmasked_float_array(verts)
if unmask_verts:
pth._vertices = _to_unmasked_float_array(verts)
Copy link
Contributor Author

@apaszke apaszke Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently this function is not as fast due to an extra call overhead, and because _to_unmasked_float_array calls np.asarray, which is pretty expensive even when you pass in an array!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See numpy/numpy#13580 :) how much do we get if we just add in _to_unmasked_float_array a check to return things as if the arg is already an array?
Also, I think the warrants a comment explaining that this is for perf. reasons.

Copy link
Contributor Author

@apaszke apaszke Mar 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that the extra call adds a fair amount of overhead itself, and a robust check would not be cheap either. I can try and circle back with the results if you want.

example_path = mpath.Path(verts_pad[0], codes)
# Looking up the function once before the iteration gives a nice speedup
_make_path = mpath.Path._fast_from_codes_and_verts
self._paths = [_make_path(xy, codes,
Copy link
Contributor Author

@apaszke apaszke Mar 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that the Python function call overheads become quite significant here, and so inlining the body of _fast_from_codes_and_verts into this function actually results in an end-to-end 10% improvement for my example plot (this function along becomes almost 2x faster)! On the other hand, it's a method that touches all the Path internals, so I would feel a bit weird inlining it here...

@QuLogic
Copy link
Member

QuLogic commented Mar 16, 2020

This appears to have broken some things.

@apaszke
Copy link
Contributor Author

apaszke commented Mar 20, 2020

@QuLogic Whoops, I ran way too small of a test subset before I've pushed this. Should be good now!

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are you using as a benchmark?

@@ -160,7 +164,8 @@ def __init__(self, vertices, codes=None, _interpolation_steps=1,
self._readonly = False

@classmethod
def _fast_from_codes_and_verts(cls, verts, codes, internals_from=None):
def _fast_from_codes_and_verts(cls, verts, codes, internals_from=None,
unmask_verts=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a private function; do we even need the flag? Maybe a check for masked arrays is needed, but I'm pretty sure all internal callers already provide arrays.

Copy link
Contributor Author

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@QuLogic The same snippet as in all the other PRs:

import io                                                             
import time                                                           
import numpy as np                                                    
from mpl_toolkits.mplot3d import Axes3D                               
import matplotlib.pyplot as plt                                             
from contextlib import contextmanager                                 
from collections import defaultdict                                   
                                                                      
results = defaultdict(lambda: np.zeros((2,)))                         
@contextmanager                                                       
def measure(name):                                                    
    s = time.perf_counter()                                           
    yield                                                             
    d = time.perf_counter() - s                                       
    results[name] += [1, d]                                           
                                                                      
def summarize():                                                      
    for name, (count, total) in sorted(results.items()):              
        print(name, total / count)                                    
                                                                      
x = y = np.linspace(-1, 1, 400)                                       
x, y = np.meshgrid(x, y)                                              
z = x ** 2 + y ** 2                                                   
                                                                      
                                                                      
for i in range(4):                                                    
    fig = plt.figure()                                                
    ax = fig.gca(projection='3d')                                     
    with measure('plot'):                                             
        ax.plot_surface(x, y, z, rstride=1, cstride=1)
    with measure('draw'):                                             
        buf = io.BytesIO()                                            
        fig.savefig(buf, format='raw')                                
    plt.close()                                                       
                                                                      
summarize()                                                           

@apaszke
Copy link
Contributor Author

apaszke commented Mar 25, 2020

@QuLogic This should have addressed all your comments.

codes = example_path.codes
self._paths = [_make_path(xy, codes,
internals_from=example_path,
unmask_verts=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unmask_verts is no longer there.

@@ -175,7 +179,7 @@ def _fast_from_codes_and_verts(cls, verts, codes, internals_from=None):
never copied, and always set to ``False`` by this constructor.
"""
pth = cls.__new__(cls)
pth._vertices = _to_unmasked_float_array(verts)
pth._vertices = verts
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a note to the docstring that verts should be unmasked, and not just any NumPy array?

@QuLogic
Copy link
Member

QuLogic commented Apr 28, 2020

You have ~5 days to rebase before 3.3 release candidate comes out.

@QuLogic
Copy link
Member

QuLogic commented May 27, 2020

Well, apparently not 5 days, but you still have a chance here.

@tacaswell tacaswell modified the milestones: v3.3.0, v3.4.0 Jun 9, 2020
@tacaswell
Copy link
Member

Moved to 3.4

@jklymak
Copy link
Member

jklymak commented Sep 10, 2020

ping @apaszke still interested in this optimization?

@jklymak jklymak marked this pull request as draft September 10, 2020 15:42
@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 21, 2021
@QuLogic QuLogic modified the milestones: v3.5.0, v3.6.0 Aug 18, 2021
@timhoffm timhoffm modified the milestones: v3.6.0, unassigned Apr 30, 2022
@story645 story645 modified the milestones: unassigned, needs sorting Oct 6, 2022
@tacaswell
Copy link
Member

Coming back to this, I am 👍🏻 on the __slots__, but 👎🏻 on adding a decorator to disable gc. I'm mixed on inlining `_make_path.

@scottshambaugh
Copy link
Contributor

I've picked this up in #29398, building off this PR (but without the garbage collection change). Thank you @apaszke for the initial work here!

@scottshambaugh scottshambaugh marked this pull request as ready for review January 7, 2025 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants