-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Store vertices as (masked) arrays in Poly3DCollection #16688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lib/mpl_toolkits/mplot3d/proj3d.py
Outdated
|
||
Parameters | ||
---------- | ||
vecs : 3xN np.ndarray or np.ma.MaskedArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might make more sense for this to take and return (..., 3)
array, which would mean your caller does not need to reshape, and would make the below...
(doing it this way is consistent with how gufuncs behave in numpy, in that they care only about the last axis by default)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that those reshapes should not trigger any reallocations, so I'm not sure if that would actually help. We could push the transposes into this function, but we cannot make them go away without changing the conventions around renderer.M
. Still, I don't think they cause any harm here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion is more based off simplicity and convention than it is performance.
You're correct that reshapes typically are free, and .T
is always free.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried your suggestions and it seems like while the original version takes ~72ms on average, the proposed changes increase the runtime to ~127ms. This is likely due to strided memory patters that start pop up everywhere (padding the array, normalizing the product, reductions over z axis). I do agree that the code is nicer though. Please do let me know how to proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for profiling. Is that the runtime of the entire plot, or just this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems that something screwy is going on with matmul(A, B)
vs matmul(B.T, A.T).T
for 2d A
and B
. matmul
is supposed to know they are equivalent and pick the faster one, but it seems to be doing the same thing for both.
Edit: numpy/numpy#15742, it only works with an explicit out
argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How large part of the slowdown is caused by matmul picking the wrong/having a slow kernel for this case then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can get a speed up with this mess:
def _proj_transform_vectors_new(vecs, M):
is_masked = isinstance(vecs, np.ma.MaskedArray)
if is_masked:
mask = vecs.mask
# allocate with a convenient memory order
vecs_pad = np.moveaxis(np.empty((vecs.shape[-1] + 1,) + vecs.shape[:-1]), 0, -1)
vecs_pad[..., :-1] = vecs
vecs_pad[..., -1] = 1
product = np.empty_like(vecs_pad)
np.matmul(vecs_pad, M.T, out=product)
result = product[..., :3] / product[..., 3:]
if is_masked:
return np.ma.array(result, mask=mask)
else:
return result
def new():
# reshape is for speed only, it works correctly without
psegments = _proj_transform_vectors_new(segments.reshape(-1, 3), M).reshape(segments.shape)
epilogue(psegments)
return psegments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This indeed yields a speedup, but a tiny one on my laptop:
New: 42.17ms
Baseline: 45.15ms
Diff: 2.98ms
I'm ok with going either way, just let me know 🤷♂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think my request would perhaps be just to move the reshapes you have at the call site inside the function - that way you can keep the conventional (...,N)
API, but the implementation can be the easy-to-read and efficient code you have now.
It would be great if we could merge this, because it is necessary to actually benefit from the patches I'm preparing now. |
872818b
to
fbee126
Compare
@eric-wieser @timhoffm I've addressed your requests, rebased on top of master and squashed the commits. |
lib/mpl_toolkits/mplot3d/art3d.py
Outdated
face_z = self._zsortfunc(psegments[..., 2], axis=-1) | ||
if isinstance(face_z, np.ma.MaskedArray): | ||
# NOTE: Unpacking .data is safe here, because every face has to | ||
# contain a valid vertex. | ||
face_z = face_z.data | ||
face_order = np.argsort(face_z, axis=-1)[::-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the point here that face_z
has a single element per face, and is therefore safe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly. My comment is based on the assumption that you don't have faces of length 0 (i.e. faces with all vertices masked). I think that's reasonable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, you could use face_z.filled(-np.inf)
, to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think about it, I don't think that it matters? Faces with no valid vertices should get all vertices coordinates filled with NaN values before drawing making them invisible, so it doesn't matter where do they end up in the face order.
lib/mpl_toolkits/mplot3d/art3d.py
Outdated
# NOTE: Unpacking .data is safe here, because every face has to | ||
# contain a valid vertex. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you confident enough about this to add
assert not face_z.mask.any()
?
If not, then it would be better to handle it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above. I would be comfortable with adding this assert if we're ok with making an assumption that you can't have faces of length 0. This certainly doesn't happen for plot_surface
and I don't see why would you do that in any other case too.
lib/mpl_toolkits/mplot3d/art3d.py
Outdated
if self._codes3d is not None: | ||
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d] | ||
segments_2d = [s.compressed().reshape(-1, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed
is going to do something bad if somehow only one component was masked (eg, it was nan
at construction).
Safer would be:
segments_2d = [s.compressed().reshape(-1, 2) | |
segments_2d = [s.data[~s.mask.any(axis=-1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How would you like to interpret the data if one component was masked? That makes no sense because you get one point with only a single coordinate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh you mean that multiple similar errors like this could cancel out? That is true, but it's such a strong internal invariant violation that I'm not sure if what your solution does (i.e. just take masked data without asking too many questions) is better? Another alternative would be to do s.data[~s.mask[..., 0]]
, as the last dimension of mask is always supposed to be constant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right it makes no sense, but the way you have it written the behavior would be to interpret [[x1, y1], [x2, --], [x3, y3], [x4, --]]
as [[x1, y1], [x2, x3], [y3, x4]]
, which is going to be very confusing to the user as it combines adjacent points.
Using any
as I do means that if the point is fully masked, its skipped, otherwise it propagates to the plotting anyway. Using all
would also be fine, if you want partial masks to skip plotting too. What's not ok is corrupting adjacent points which aren't bad.
If you really wanted to play it safe you could assert that np.all(mask.any(axis=-1) == mask.all(axis-1)
, but that would cost you a bunch of time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess using mask.all
might be more reasonable. Would that be ok with you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've looked into it, and I'm not sure if we should be doing anything like that. Note that right below this line there is another comprehension that builds up a list of codes, each with length that is supposed to be equal to the length of this segment. So we can't just drop vertices arbitrarily and the only way to make this sane is to assume that the invariant I've mentioned holds. What do you think?
Please fix the flake8 errors: |
41f36ac
to
1e08084
Compare
lib/mpl_toolkits/mplot3d/art3d.py
Outdated
if self._codes3d is not None: | ||
codes = [self._codes3d[idx] for z, s, fc, ec, idx in z_segments_2d] | ||
segments_2d = [s.data[~s.mask.all(axis=-1)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With my implementation in the constructor, you now have the invariant that mask.all(axis==-1) == mask[:,0]
, which if you wanted to you could use for a very marginal speedup:
segments_2d = [s.data[~s.mask.all(axis=-1)] | |
# this invariant is ensured by the constructor | |
assert s.mask.strides[1] == 0 | |
# which means that s.mask[:,i] is all the same location in memory, so there's no need to call `all`. | |
segments_2d = [s.data[~s.mask[:,0]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or I could put in the assert and use compressed()
, which avoids the unnecessary mask manipulation!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed
is internally more complex that that operation anyway though. You're comparing data[~mask[:,0]]
with data.ravel()[mask.ravel()].reshape((-1, 2))
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, mask.ravel()
is now a lot more expensive than mask[:,0]
, as it has to create a copy to handle the 0 stride.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, here's what I came up with:
@@ -667,6 +667,7 @@ class Poly3DCollection(PolyCollection):
self._facecolors3d = self._facecolors
psegments = proj3d._proj_transform_vectors(self._segments, renderer.M)
+ is_masked = isinstance(psegments, np.ma.MaskedArray)
num_faces = len(psegments)
# This extra fuss is to re-order face / edge colors
@@ -681,19 +682,24 @@ class Poly3DCollection(PolyCollection):
cedge = cedge.repeat(num_faces, axis=0)
face_z = self._zsortfunc(psegments[..., 2], axis=-1)
- if isinstance(face_z, np.ma.MaskedArray):
+ if is_masked:
# NOTE: Unpacking .data is safe here, because every face has to
# contain a valid vertex.
face_z = face_z.data
face_order = np.argsort(face_z, axis=-1)[::-1]
+ segments_2d = psegments[face_order, :, :2]
if self._codes3d is not None:
- segments_2d = [s.data[~s.mask.all(axis=-1)]
- for s in psegments[face_order, :, :2]]
+ if is_masked:
+ # NOTE: We cannot assert the same on segments_2d, as it is a
+ # result of advanced indexing, so its mask is a newly
+ # allocated tensor. However, both of those asserts are
+ # equivalent.
+ assert psegments.mask.strides[-1] == 0
+ segments_2d = [s.compressed().reshape(-1, 2) for s in segments_2d]
codes = [self._codes3d[idx] for idx in face_order]
PolyCollection.set_verts_and_codes(self, segments_2d, codes)
else:
- segments_2d = psegments[face_order, :, :2]
PolyCollection.set_verts(self, segments_2d, self._closed)
self._facecolors2d = cface[face_order]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those .ravel()
s are no-ops, because segments_2d
comes from advanced indexing anyway, so I still went with compressed()
.
This removes a bunch of very expensive list comprehensions, dropping rendering times by half for some 3D surface plots.
1e08084
to
8633bbd
Compare
Restarting an earlier conversation:
Where exactly does this vectorized reduction occur? A option might be to store: self._segment_3d_data = ... # (n_faces, max_verts, 3)
self._vertex_mask = ... # (n_faces, max_verts, 1) and reassemble them into a masked array only when needed. The assembly should be free, and via |
This is the reduction. I think that keeping the mask along with the array really does make the code simpler, and I don't think it should add a lot of performance penalties (it should be waaaay better than without this patch). I do agree however that it may be simpler to keep the invariants if we separate those. I'm ok with doing that if you think that's the way to go. No strong preference from my side anymore. |
If that's the only place, then you could consider if isinstance(segments3d, np.ma.MaskedArray):
self._segment_3d = segments3d.data # (n_faces, max_verts, 3)
self._vertex_valid = ~segments3d.mask.any(axis-1) # (n_faces, max_verts)
elif isinstance(segments3d, np.ndarray):
self._segment_3d = segments3d # (n_faces, max_verts, 3)
self._vertex_valid = np.True_
else:
# existing code without the broadcast_to, and an inverted condition vertex_z = psegments[..., 2]
if self._vertex_valid is not np.True_:
vertex_z = np.ma.array(psegments[..., 2], mask=~self._vertex_valid)
face_z = self._zsortfunc(vertex_z, axis=-1) # the if is an optimization to avoid an unnecessary copy, it would work unconditionally
if self._vertex_valid is not np.True_:
segments_2d = [s[self.self._vertex_valid, :] for s in segments_2d] That second snippet becomes even simpler and 4x faster (on an array of 10000 elements) once the numpy version bumps: vertex_z = psegments[..., 2]
face_z = self._zsortfunc(vertex_z, axis=-1, where=self._vertex_valid) |
Yeah I think that's fair and makes things nicer in the end. I'll update the code tomorrow! |
My only question is: why use |
I guess there's no reason to use |
@eric-wieser Turns out it's not as simple as we thought — see the last commit. There's one extra reduction over the Z axis at the very end of the function, and we always have to take care of masking before we call |
pzs = pfaces[..., 2] | ||
if needs_masking: | ||
pzs = np.ma.MaskedArray(pzs, mask=self._invalid_vertices) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tempting to move this down to line 692, where it's used for the first time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt like it's logically connected to pfaces
and it's used on lines 692, 725 and 729, so I didn't want to make the first use special in any way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggested this because it might be worth adding an optimization today along the lines of:
if numpy_version >= LooseVersion(1.17.0):
# use the where argument, skip the masked array entirely
else:
...
Happy to have that declared as out of scope
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy to do that, but I'm still a bit afraid that it will make the code even less readable. I'd have to insert the same if
on lines 692 and 729. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe try it and see if it actually helps with performance locally in any significant way? If not, then I think it's not worth it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tried this, and the where=
reductions are indeed faster!
In [1]: import numpy as np
In [2]: maskf = np.zeros((399 * 399, 4), dtype=bool)
In [3]: maskt = np.ones((399 * 399, 4), dtype=bool)
In [4]: x = np.ones((399 * 399, 4), dtype=float)
In [5]: mx = np.ma.MaskedArray(x, maskf)
In [6]: %timeit np.max(mx, axis=-1)
10.3 ms ± 441 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [7]: %timeit np.max(x, axis=-1, where=maskt, initial=0)
7.09 ms ± 139 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
On the other hand, it looks like np.average
doesn't support the where=
argument, so I don't think we can do this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, that's a good catch. You can almost get away with passing the mask into the weights
argument of np.average, but that behaves poorly on NaNs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hah, good point! And it looks like it still beats the masked array implementation, which is very surprising to me:
In [9]: %timeit np.average(x, axis=-1, weights=maskt)
8.69 ms ± 137 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [10]: %timeit np.average(mx, axis=-1)
12.9 ms ± 370 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
But yes, there's an issue with invalid floating point values and there's no default argument we can pass in. I guess the latter problem could be fixed by using a wrapper like lambda *args, **kwargs: np.average(*args, weight=kwargs.pop('where', 1.), **kwargs)
in the dict defining zsort functions.
So what's the final decision?
Co-Authored-By: Eric Wieser <wieser.eric@gmail.com>
for idx, ((xs, ys, zs), fc, ec) | ||
in enumerate(zip(xyzlist, cface, cedge))), | ||
key=lambda x: x[0], reverse=True) | ||
face_z = self._zsortfunc(pzs, axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps a marker for future cleanup would be handy here.
face_z = self._zsortfunc(pzs, axis=-1) | |
# TODO: When we drop numpy < 1.17, investigate using the `where` argument to reductions | |
face_z = self._zsortfunc(pzs, axis=-1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since matplotlib 3.5 we require numpy >= 1.17, so this can be updated immediately.
I was thinking of rebasing #11577, but it looks like this PR chain might obsolete it. What is the status on this PR, besides needing a rebase? |
@apaszke Gentle ping? |
PR Summary
This removes a bunch of very expensive list comprehensions, dropping
rendering times by half for some 3D surface plots.
Runtime of the following code drops from 16s to 10s after #16675, while stacking this PR on top of #16675 lowers the time further to around 5s. Another 3s are now spent in
PolyCollection.set_verts
which I'm hoping to improve later.PR Checklist