Skip to content

Store vertices as (masked) arrays in Poly3DCollection #16688

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

apaszke
Copy link
Contributor

@apaszke apaszke commented Mar 5, 2020

PR Summary

This removes a bunch of very expensive list comprehensions, dropping
rendering times by half for some 3D surface plots.

Runtime of the following code drops from 16s to 10s after #16675, while stacking this PR on top of #16675 lowers the time further to around 5s. Another 3s are now spent in PolyCollection.set_verts which I'm hoping to improve later.

import numpy as np
import matplotlib.pyplot as plt

fig = plt.figure()
ax = fig.gca(projection='3d')

x = y = np.linspace(-1, 1, 400)
x, y = np.meshgrid(x, y)
z = x ** 2 + y ** 2

ax.plot_surface(x, y, z, rstride=1, cstride=1)
fig.savefig('dump.png')
plt.close()

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way


Parameters
----------
vecs : 3xN np.ndarray or np.ma.MaskedArray
Copy link
Contributor

@eric-wieser eric-wieser Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense for this to take and return (..., 3) array, which would mean your caller does not need to reshape, and would make the below...

(doing it this way is consistent with how gufuncs behave in numpy, in that they care only about the last axis by default)

Copy link
Contributor Author

@apaszke apaszke Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that those reshapes should not trigger any reallocations, so I'm not sure if that would actually help. We could push the transposes into this function, but we cannot make them go away without changing the conventions around renderer.M. Still, I don't think they cause any harm here.

Copy link
Contributor

@eric-wieser eric-wieser Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This suggestion is more based off simplicity and convention than it is performance.

You're correct that reshapes typically are free, and .T is always free.

Copy link
Contributor Author

@apaszke apaszke Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried your suggestions and it seems like while the original version takes ~72ms on average, the proposed changes increase the runtime to ~127ms. This is likely due to strided memory patters that start pop up everywhere (padding the array, normalizing the product, reductions over z axis). I do agree that the code is nicer though. Please do let me know how to proceed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for profiling. Is that the runtime of the entire plot, or just this function?

Copy link
Contributor

@eric-wieser eric-wieser Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems that something screwy is going on with matmul(A, B) vs matmul(B.T, A.T).T for 2d A and B. matmul is supposed to know they are equivalent and pick the faster one, but it seems to be doing the same thing for both.

Edit: numpy/numpy#15742, it only works with an explicit out argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How large part of the slowdown is caused by matmul picking the wrong/having a slow kernel for this case then?

Copy link
Contributor

@eric-wieser eric-wieser Mar 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get a speed up with this mess:

def _proj_transform_vectors_new(vecs, M):
    is_masked = isinstance(vecs, np.ma.MaskedArray)
    if is_masked:
        mask = vecs.mask
    # allocate with a convenient memory order
    vecs_pad = np.moveaxis(np.empty((vecs.shape[-1] + 1,) + vecs.shape[:-1]), 0, -1)
    vecs_pad[..., :-1] = vecs
    vecs_pad[..., -1] = 1
    product = np.empty_like(vecs_pad)
    np.matmul(vecs_pad, M.T, out=product)
    result = product[..., :3] / product[..., 3:]
    if is_masked:
        return np.ma.array(result, mask=mask)
    else:
        return result


def new():
    # reshape is for speed only, it works correctly without
    psegments = _proj_transform_vectors_new(segments.reshape(-1, 3), M).reshape(segments.shape)
    epilogue(psegments)
    return psegments

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indeed yields a speedup, but a tiny one on my laptop:

New:		 42.17ms
Baseline:	 45.15ms
Diff:		 2.98ms

I'm ok with going either way, just let me know 🤷‍♂

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my request would perhaps be just to move the reshapes you have at the call site inside the function - that way you can keep the conventional (...,N) API, but the implementation can be the easy-to-read and efficient code you have now.

@apaszke
Copy link
Contributor Author

apaszke commented Mar 21, 2020

It would be great if we could merge this, because it is necessary to actually benefit from the patches I'm preparing now.

@apaszke apaszke force-pushed the vectorize_3d_projection branch from 872818b to fbee126 Compare March 22, 2020 12:31
@apaszke
Copy link
Contributor Author

apaszke commented Mar 22, 2020

@eric-wieser @timhoffm I've addressed your requests, rebased on top of master and squashed the commits.

Comment on lines 678 to 689
face_z = self._zsortfunc(psegments[..., 2], axis=-1)
if isinstance(face_z, np.ma.MaskedArray):
# NOTE: Unpacking .data is safe here, because every face has to
# contain a valid vertex.
face_z = face_z.data
face_order = np.argsort(face_z, axis=-1)[::-1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the point here that face_z has a single element per face, and is therefore safe?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly. My comment is based on the assumption that you don't have faces of length 0 (i.e. faces with all vertices masked). I think that's reasonable?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively, you could use face_z.filled(-np.inf), to be safe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think about it, I don't think that it matters? Faces with no valid vertices should get all vertices coordinates filled with NaN values before drawing making them invisible, so it doesn't matter where do they end up in the face order.

Comment on lines 680 to 687
# NOTE: Unpacking .data is safe here, because every face has to
# contain a valid vertex.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you confident enough about this to add

assert not face_z.mask.any()

?

If not, then it would be better to handle it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment above. I would be comfortable with adding this assert if we're ok with making an assumption that you can't have faces of length 0. This certainly doesn't happen for plot_surface and I don't see why would you do that in any other case too.

if self._codes3d is not None:
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d]
segments_2d = [s.compressed().reshape(-1, 2)
Copy link
Contributor

@eric-wieser eric-wieser Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed is going to do something bad if somehow only one component was masked (eg, it was nan at construction).

Safer would be:

Suggested change
segments_2d = [s.compressed().reshape(-1, 2)
segments_2d = [s.data[~s.mask.any(axis=-1)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would you like to interpret the data if one component was masked? That makes no sense because you get one point with only a single coordinate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you mean that multiple similar errors like this could cancel out? That is true, but it's such a strong internal invariant violation that I'm not sure if what your solution does (i.e. just take masked data without asking too many questions) is better? Another alternative would be to do s.data[~s.mask[..., 0]], as the last dimension of mask is always supposed to be constant.

Copy link
Contributor

@eric-wieser eric-wieser Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it makes no sense, but the way you have it written the behavior would be to interpret [[x1, y1], [x2, --], [x3, y3], [x4, --]] as [[x1, y1], [x2, x3], [y3, x4]], which is going to be very confusing to the user as it combines adjacent points.

Using any as I do means that if the point is fully masked, its skipped, otherwise it propagates to the plotting anyway. Using all would also be fine, if you want partial masks to skip plotting too. What's not ok is corrupting adjacent points which aren't bad.

If you really wanted to play it safe you could assert that np.all(mask.any(axis=-1) == mask.all(axis-1), but that would cost you a bunch of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess using mask.all might be more reasonable. Would that be ok with you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've looked into it, and I'm not sure if we should be doing anything like that. Note that right below this line there is another comprehension that builds up a list of codes, each with length that is supposed to be equal to the length of this segment. So we can't just drop vertices arbitrarily and the only way to make this sane is to assume that the invariant I've mentioned holds. What do you think?

@timhoffm
Copy link
Member

Please fix the flake8 errors:
./lib/mpl_toolkits/mplot3d/art3d.py:618:80: E501 line too long (83 > 79 characters)
./lib/mpl_toolkits/mplot3d/art3d.py:623:55: E231 missing whitespace after ','

@apaszke apaszke force-pushed the vectorize_3d_projection branch from 41f36ac to 1e08084 Compare March 22, 2020 17:24
if self._codes3d is not None:
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d]
segments_2d = [s.data[~s.mask.all(axis=-1)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With my implementation in the constructor, you now have the invariant that mask.all(axis==-1) == mask[:,0], which if you wanted to you could use for a very marginal speedup:

Suggested change
segments_2d = [s.data[~s.mask.all(axis=-1)]
# this invariant is ensured by the constructor
assert s.mask.strides[1] == 0
# which means that s.mask[:,i] is all the same location in memory, so there's no need to call `all`.
segments_2d = [s.data[~s.mask[:,0]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or I could put in the assert and use compressed(), which avoids the unnecessary mask manipulation!

Copy link
Contributor

@eric-wieser eric-wieser Mar 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compressed is internally more complex that that operation anyway though. You're comparing data[~mask[:,0]] with data.ravel()[mask.ravel()].reshape((-1, 2))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, mask.ravel() is now a lot more expensive than mask[:,0], as it has to create a copy to handle the 0 stride.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here's what I came up with:

@@ -667,6 +667,7 @@ class Poly3DCollection(PolyCollection):
             self._facecolors3d = self._facecolors 
  
         psegments = proj3d._proj_transform_vectors(self._segments, renderer.M)
+        is_masked = isinstance(psegments, np.ma.MaskedArray)
         num_faces = len(psegments)
  
         # This extra fuss is to re-order face / edge colors
@@ -681,19 +682,24 @@ class Poly3DCollection(PolyCollection):
                 cedge = cedge.repeat(num_faces, axis=0)
  
         face_z = self._zsortfunc(psegments[..., 2], axis=-1)
-        if isinstance(face_z, np.ma.MaskedArray): 
+        if is_masked:
             # NOTE: Unpacking .data is safe here, because every face has to
             #       contain a valid vertex.
             face_z = face_z.data
         face_order = np.argsort(face_z, axis=-1)[::-1]
  
+        segments_2d = psegments[face_order, :, :2]
         if self._codes3d is not None:
-            segments_2d = [s.data[~s.mask.all(axis=-1)]
-                           for s in psegments[face_order, :, :2]]
+            if is_masked:
+                # NOTE: We cannot assert the same on segments_2d, as it is a
+                #       result of advanced indexing, so its mask is a newly
+                #       allocated tensor. However, both of those asserts are
+                #       equivalent.
+                assert psegments.mask.strides[-1] == 0
+                segments_2d = [s.compressed().reshape(-1, 2) for s in segments_2d]
             codes = [self._codes3d[idx] for idx in face_order]
             PolyCollection.set_verts_and_codes(self, segments_2d, codes)
         else:
-            segments_2d = psegments[face_order, :, :2]
             PolyCollection.set_verts(self, segments_2d, self._closed)
  
         self._facecolors2d = cface[face_order]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those .ravel()s are no-ops, because segments_2d comes from advanced indexing anyway, so I still went with compressed().

This removes a bunch of very expensive list comprehensions, dropping
rendering times by half for some 3D surface plots.
@apaszke apaszke force-pushed the vectorize_3d_projection branch from 1e08084 to 8633bbd Compare March 22, 2020 18:35
@eric-wieser
Copy link
Contributor

Restarting an earlier conversation:

I'm not convinced masked arrays are the right approach here. Component-wise masking feels a little awkward. Would it be better to store a _segment_lengths array separately?

It would be a better model of the data, but that wouldn't allow doing vectorized Z-reductions, because there are no NumPy kernels for arrays with lengths.

Where exactly does this vectorized reduction occur? A option might be to store:

self._segment_3d_data = ...   # (n_faces, max_verts, 3)
self._vertex_mask = ...  # (n_faces, max_verts, 1)

and reassemble them into a masked array only when needed. The assembly should be free, and via broadcast_to you can avoid _vertex_mask ever tripling in size. It also means you can easily maintain your invariants.

@apaszke
Copy link
Contributor Author

apaszke commented Mar 22, 2020

This is the reduction. I think that keeping the mask along with the array really does make the code simpler, and I don't think it should add a lot of performance penalties (it should be waaaay better than without this patch). I do agree however that it may be simpler to keep the invariants if we separate those. I'm ok with doing that if you think that's the way to go. No strong preference from my side anymore.

@eric-wieser
Copy link
Contributor

eric-wieser commented Mar 22, 2020

If that's the only place, then you could consider

if isinstance(segments3d, np.ma.MaskedArray):
    self._segment_3d = segments3d.data # (n_faces, max_verts, 3)
    self._vertex_valid = ~segments3d.mask.any(axis-1)  # (n_faces, max_verts)
elif isinstance(segments3d, np.ndarray):
    self._segment_3d = segments3d # (n_faces, max_verts, 3)
    self._vertex_valid = np.True_
else:
    # existing code without the broadcast_to, and an inverted condition
vertex_z = psegments[..., 2]
if self._vertex_valid is not np.True_:
    vertex_z = np.ma.array(psegments[..., 2], mask=~self._vertex_valid)
face_z = self._zsortfunc(vertex_z, axis=-1)
# the if is an optimization to avoid an unnecessary copy, it would work unconditionally
if self._vertex_valid is not np.True_:
    segments_2d = [s[self.self._vertex_valid, :] for s in segments_2d]

That second snippet becomes even simpler and 4x faster (on an array of 10000 elements) once the numpy version bumps:

vertex_z = psegments[..., 2]
face_z = self._zsortfunc(vertex_z, axis=-1, where=self._vertex_valid)

@apaszke
Copy link
Contributor Author

apaszke commented Mar 22, 2020

Yeah I think that's fair and makes things nicer in the end. I'll update the code tomorrow!

@apaszke
Copy link
Contributor Author

apaszke commented Mar 22, 2020

My only question is: why use np.True_ instead of True? Is it because True is not a valid mask argument to ma.MaskedArray?

@eric-wieser
Copy link
Contributor

eric-wieser commented Mar 23, 2020

I guess there's no reason to use np.True_ any more. That's left from when I was thinking about np.False_, which has special meaning for masked arrays as np.ma.nomask. If anything, using np.True_ may not hit the fast-path of where=, so I wouldn't recommend using it!

@apaszke
Copy link
Contributor Author

apaszke commented Mar 23, 2020

@eric-wieser Turns out it's not as simple as we thought — see the last commit. There's one extra reduction over the Z axis at the very end of the function, and we always have to take care of masking before we call super().set_verts. Still, this version is likely to be a little bit faster than the old one, so I'm leaving the choice up to you.

Comment on lines +677 to +679
pzs = pfaces[..., 2]
if needs_masking:
pzs = np.ma.MaskedArray(pzs, mask=self._invalid_vertices)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tempting to move this down to line 692, where it's used for the first time.

Copy link
Contributor Author

@apaszke apaszke Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I felt like it's logically connected to pfaces and it's used on lines 692, 725 and 729, so I didn't want to make the first use special in any way.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggested this because it might be worth adding an optimization today along the lines of:

if numpy_version >= LooseVersion(1.17.0):
   # use the where argument, skip the masked array entirely
else:
   ...

Happy to have that declared as out of scope

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy to do that, but I'm still a bit afraid that it will make the code even less readable. I'd have to insert the same if on lines 692 and 729. WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try it and see if it actually helps with performance locally in any significant way? If not, then I think it's not worth it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried this, and the where= reductions are indeed faster!

In [1]: import numpy as np                                            
                                                  
In [2]: maskf = np.zeros((399 * 399, 4), dtype=bool)                  
                                                  
In [3]: maskt = np.ones((399 * 399, 4), dtype=bool)                   
                         
In [4]: x = np.ones((399 * 399, 4), dtype=float)                      
                         
In [5]: mx = np.ma.MaskedArray(x, maskf)          
                                                  
In [6]: %timeit np.max(mx, axis=-1)                                   
10.3 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [7]: %timeit np.max(x, axis=-1, where=maskt, initial=0)            
7.09 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

On the other hand, it looks like np.average doesn't support the where= argument, so I don't think we can do this...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that's a good catch. You can almost get away with passing the mask into the weights argument of np.average, but that behaves poorly on NaNs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hah, good point! And it looks like it still beats the masked array implementation, which is very surprising to me:

In [9]: %timeit np.average(x, axis=-1, weights=maskt)                                                
8.69 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

In [10]: %timeit np.average(mx, axis=-1)          
12.9 ms ± 370 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

But yes, there's an issue with invalid floating point values and there's no default argument we can pass in. I guess the latter problem could be fixed by using a wrapper like lambda *args, **kwargs: np.average(*args, weight=kwargs.pop('where', 1.), **kwargs) in the dict defining zsort functions.

So what's the final decision?

apaszke and others added 2 commits March 23, 2020 14:05
Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>
for idx, ((xs, ys, zs), fc, ec)
in enumerate(zip(xyzlist, cface, cedge))),
key=lambda x: x[0], reverse=True)
face_z = self._zsortfunc(pzs, axis=-1)
Copy link
Contributor

@eric-wieser eric-wieser Mar 23, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps a marker for future cleanup would be handy here.

Suggested change
face_z = self._zsortfunc(pzs, axis=-1)
# TODO: When we drop numpy < 1.17, investigate using the `where` argument to reductions
face_z = self._zsortfunc(pzs, axis=-1)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since matplotlib 3.5 we require numpy >= 1.17, so this can be updated immediately.

@QuLogic
Copy link
Member

QuLogic commented Aug 7, 2020

I was thinking of rebasing #11577, but it looks like this PR chain might obsolete it. What is the status on this PR, besides needing a rebase?

@QuLogic
Copy link
Member

QuLogic commented Sep 9, 2020

@apaszke Gentle ping?

@scottshambaugh
Copy link
Contributor

scottshambaugh commented Jan 7, 2025

I've picked this up in #29397, building off this PR. Thank you @apaszke for the initial work here, it was super helpful in showing the way to some significant speedups!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants