-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Updated downsampling #8776
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated downsampling #8776
Conversation
lib/matplotlib/lines.py
Outdated
@@ -790,6 +790,9 @@ def draw(self, renderer): | |||
cap = self._dashcapstyle | |||
join = self._dashjoinstyle | |||
else: | |||
print(tpath.vertices.size) | |||
tpath = affine.inverted().transform_path(affine.transform_path(tpath).cleaned(simplify=True)) | |||
print(tpath.vertices.size) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess what is happening is that you need to do the simplification in screen units, but the later draw routine needs the path in data units, so you need to convert one extra time back and forth? That seems inefficient, any chance you could either 1. do the simplification at the last minute (once you don't need to convert back and forth anymore), or at least 2. keep the simplified path in screen units, and set the transform to identity?
Also, instead of prints (which need to go anyways :-)), some tests (e.g. with paths for which you know what the result of simplification should be) would be welcome, to show that the simplifier is indeed working as expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copying a comment that I haven't pushed yet:
# the path simplification code expects the path
# to be in pixel space, not data space, but the
# renderer downstream expects the path to be
# in data space. The transformation `affine`
# will put the path into pixel space, and
# affine.inverted() will transform it back to
# data space. So, transform using affine, simplify
# the path, and transform using the affine inverted.
It is inefficient, but I see no way around it at the moment. Do you know what populates the renderer
argument? The only way to delay cleaning until it has been transformed is to do it inside this call: renderer.draw_path(gc, path, trans)
from
def _draw_solid(self, renderer, gc, path, trans):
gc.set_linestyle('solid')
gc.set_dashes(self._dashOffset, self._dashSeq)
renderer.draw_path(gc, path, trans)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Obviously print statements will be removed and tests will be added.
Testing in general will be a huge pain, since nearly every single image will fail the current image comparisons. We may have to change the default value of path.simplify
to False to avoid this...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it looks like renderer
is basically the backend. Without requiring changes to the backend API (so that it takes in an already transformed path, and I'm sure it was not designed that way initially also for performance reasons), I don't think we can avoid it. It's still faster than not doing the downsampling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may be able to get away with editing _classic_test.mplstyle, which is basically there for that reason (we try not to update all test images at once if possible, as this really leads to bloating of the repo size).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@anntzer are you saying if I edit _classic_test.mplstyle
to turn path simplification off, that all the figures created during testing will be made with simplification turned off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you know if that behavior is documented anywhere? I've been reading through developer documentation but not finding anything.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope so, at least...
6555943
to
e76a19a
Compare
lib/matplotlib/lines.py
Outdated
# the path, and transform using the affine inverted. | ||
tpath = affine.inverted().transform_path( | ||
affine.transform_path(tpath).cleaned(simplify=True) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would something like
tpath = affine.transform_path(tpath).cleaned(simplify=True)
affine = IdentityTransform() # or whatever it is called
not work to avoid the back and forth discussed earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll play around with that. I'm a little worried about changing that since I don't know what downstream rendering code relies on (does it use the data-space to generate axes labels? scale text? etc.). It might be premature optimization at this point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really meant something like "Check if affine is used anywhere after. If not, do /previous message/. If it is, set a flag somewhere, and only use an identity affine when drawing the path".
Well perhaps the second one would be an premature optimization if needed, but please give the first one a try if possible, given that that would be fairly simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Initial tests just created a blank figure when doing that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry didn't see your 2nd message before posting. So I guess it looks like affine
is being used for something else later on, I'm guessing in the clipping routine:
<path_converters.h>
/*
This file contains a number of vertex converters that modify
paths. They all work as iterators, where the output is generated
on-the-fly, and don't require a copy of the full data.
Each class represents a discrete step in a "path-cleansing" pipeline.
They are currently applied in the following order in the Agg backend:
...
3. PathClipper: Clips line segments to a given rectangle. This is
helpful for data reduction, and also to avoid a limitation in
Agg where coordinates can not be larger than 24-bit signed
integers.
...
5. PathSimplifier: Removes line segments from highly dense paths
that would not have an impact on their appearance. Speeds up
rendering and reduces file sizes.
...
*/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad, but I can live with that. Perhaps also mention the issue in a comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a comment. It's possible that there may yet be a simple path forward (pun intended) on this. I'll take a look tomorrow with fresh eyes.
lib/matplotlib/path.py
Outdated
(self._codes is None or np.all(self._codes <= Path.LINETO))) | ||
len(self._vertices) >= 128 and | ||
(self._codes is None or np.all(self._codes <= Path.LINETO)) and | ||
self._simplify_threshold > 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the test on simplify_threshold should go above (say just after path.simplify), in order to avoid iterating on the codes if possible? (yes, I realize that's unlikely to be the bottleneck)
also it looks like we may be able to deprecate path.simplify and fold it into path.simplify_threshold being zero or not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea. I think combining the parameters makes a lot of sense. They are definitely redundant in purpose. I'll probably also suggest upping the default from 1/9 to something closer to 1/2. It turns out the simplify_threshold is squared in the simplification code, so 1/9 is actually only reducing paths that are 1/81 pixel away from being not parallel! Not sure what the original logic on that was. Perhaps original author was thinking of it in terms of square pixels difference...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps that can go into a separate PR (linked to this discussion)? If you look around mpl/init.py and mpl/rcsetup.py you should see how rcparam deprecation is handled. That'll need a notice in api_changes too (this PR will probably benefit from an entry in the docs whatsnew too).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this PR will probably benefit from an entry in the docs whatsnew too
Yep, already have a draft in doc/users/whats_new/path_simplification_updates.rst
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realized this morning that it looks like it's squared in simplification code to avoid having to compute sqrt step of norm calculation, so it should be behaving as advertised.
@@ -13,11 +15,13 @@ | |||
from matplotlib.path import Path | |||
|
|||
|
|||
# NOTE: All of these tests assume that path.simplify is set to True | |||
# (the default) | |||
# NOTE: All of these tests used to assume that path.simplify is set to True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps just remove the comment?
You may also be able to just use a module-level autouse fixture to set the rcparam (see https://docs.pytest.org/en/latest/fixture.html).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the tip!
Things I need help on:
|
0ac2765
to
906d2ef
Compare
# NOTE: All of these tests assume that path.simplify is set to True | ||
# (the default) | ||
@pytest.fixture | ||
def set_simplify_and_threshold(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason not to make this autoused? (perhaps, I'm asking)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is exactly one test that turns it off, but it's turned off manually in the test... so no, apart from me not reading pytest docs thoroughly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. In fact, using scope='module'
makes some tests fail. Possibly b/c it's only run once in this module, meaning the test that turns off simplification would effect later tests? Don't know for sure. I'll leave it as is for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your help so far @anntzer
I don't have the time right now to look at the test failures, but will try to come back to it at some point (no guarantees of schedule so if someone else volunteers, even better :-)). Overall the PR seems to be heading very much in the right direction, hopefully just needs some polishing. |
Anyone know the agg code well? I did some profiling to see if there were any easy wins and it turns out almost all of the time is being spent in |
You may have more luck asking on the mailing list(s) (...perhaps). |
b663d65
to
90872b5
Compare
So the most recent batch of commits should end up with about 9 image comparison failures. I see two ways to address this:
There's obvious reasons to not update expected images, but I also think adding an extra parameter that is really only used during testing is equally bad. Any thoughts? |
src/path_converters.h
Outdated
@@ -595,6 +609,10 @@ class PathSimplifier : protected EmbeddedQueue<9> | |||
return m_source->vertex(x, y); | |||
} | |||
|
|||
/* NOTE: the below comment is out of date. Lines < 1 pixel in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the comment is out of date, just remove it or update it accordingly.
Looking at it, I don't think it needs to be a full changelog of that code chunk, it can just describe what it's doing and mention the critical improvements, or ideas that have been tried and did or did not work.
@mdboom is probably the closest we have to an Agg expert. I am 👍 on updating those test images. The changes are very small and the speed is definitely worth it. I think a better test of this speed up is to use right click with the pan tool and zoom the x-range way in and out. As a side note, there is a |
src/path_converters.h
Outdated
m_lastMax = true; | ||
m_dnorm2ForwardMax = m_origdNorm2; | ||
m_dnorm2BackwardMax = 0.0; | ||
m_lastForwardMax = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should m_lastBackwardMax
be reset here as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure it can only be false at this point in the code. It wouldn't hurt though, so if you're worried about it I'm OK adding that in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is also helpful breadcrumbs for the next person to look at this code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added in latest commit
``path.simplify_threshold`` RC parameters, as long as other conditions | ||
such as not using dashed lines are met. Simplification will only happen | ||
when drawing the line segment portion of a path, if you are also drawing | ||
markers you should consider using the ``markevery`` option to ``plot``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a remark here about reasonable values and why you would want to use this?
As near as I can tell this feature is un-documented except for a comment in the change log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was some discussion as to folding path.simplify and path.simplify_threshold into a single rcparam (zero vs nonzero) that got lost in the thread, too. @kbrose Do you plan to do it? It can be a separate PR of course.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm adding some more text for the whats new entry now. Do you think it's worth adding more (permanent) documentation somewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I think it would be cleaner as a separate PR (I'm willing to do it, but I'll be focusing on this and other performance optimizations first). @anntzer @tacaswell the idea was that path.simplify
could be removed because it is redundant with path.simplify_threshold == 0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No hurry.
Perhaps start a "performance" section in http://matplotlib.org/faq/index.html or http://matplotlib.org/faq/usage_faq.html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I liked that idea, but usage faq has been removed from master? Looks like it happened in eaaa2b4. Guess I'll put somewhere else
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably got moved to 'tutorials/01_introductory/usage.py' but github is being too lazy about the diffs.
I think I understand this and am 👍 over all. Might be worth resetting a bit more of the internal state around line 708. If |
In [25]: p = mpath.Path(np.array([[0, 1, 3], [0, 0, 0]]).T)
In [26]: p.cleaned(simplify=True)
Out[26]:
Path(array([[ 0., 0.],
[ 3., 0.],
[ 3., 0.],
[ 0., 0.]]), array([1, 2, 2, 0], dtype=uint8)) This seems to pick up an extra degenerate point at the end, but that is also the behavior of the current code so probably ok to leave that for now. A recent-ish github feature is ability for maintainers to push commits to PR barnches. Do you mind if I push some more tests to your branch or would you prefer a PR from me into your branch? |
There seems to be a bug if all of the points lie on the vector: In [60]: p = mpath.Path(np.array([[0, 1, 3, -1, 1], [0, 0, 0, 0, 0]]).T)
In [61]: p.cleaned(simplify=True)
Out[61]:
Path(array([[ 0., 0.],
[ 3., 0.],
[ 1., 0.],
[ 0., 0.]]), array([1, 2, 2, 0], dtype=uint8)) |
@tacaswell please feel free to push tests directly. I'll look into that behavior. |
The tests I just pushed fail in the all-on-the-line case I mentioned above. |
@tacaswell that's a good catch, and I think I know why it happens. Digging in there also seems to show how those degenerate points are being added: matplotlib/src/path_converters.h Line 767 in fcce35d
I'm not sure why that's being done. I'll leave it in for now unless I can figure out why it was added in first place. edit: updated link to more permanent location. |
Most recent commit will end up with some more image failures. Most of them looked pretty small, except the clipping testing. I want to do more testing to be sure, but I think I fixed a long-standing bug that might actually mean the new version is the "correct" version: diff --git a/src/path_converters.h b/src/path_converters.h
...
@@ -702,9 +703,12 @@ class PathSimplifier : protected EmbeddedQueue<9>
m_dnorm2ForwardMax = m_origdNorm2;
m_dnorm2BackwardMax = 0.0;
m_lastForwardMax = true;
+ m_lastBackwardMax = false;
- m_nextX = m_lastWrittenX = m_lastx = *x;
- m_nextY = m_lastWrittenY = m_lasty = *y;
+ m_lastWrittenX = m_lastx;
+ m_lastWrittenY = m_lasty;
+ m_nextX = m_lastx = *x;
+ m_nextY = m_lasty = *y;
continue;
}
... The |
# using min/max will get the expected results out of order: | ||
# min/max for simplification code depends on original vector, | ||
# and if angle is outside above range then simplification | ||
# min/max will be opposite from actual min/max. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@tacaswell Updated documentation and rebased. The actual commit can be found at kbrose@ec75517. |
Hmm. Docs build failed with unhelpful message
Building locally seems to give a little more info:
which is strange since I can open the link manually... |
Not sure that's the problem link, but please try rebasing against latest |
@QuLogic I rebased against master locally without luck. I updated that |
Oh, hmm, just realized I updated my install of the |
Oh, this is due to outdated sphinx-gallery; it should work on the latest version. I don't think it's the same problem as CircleCI. |
Ok... pip installing up to
Full error:
Are you able to successfully build on master? |
304eba1
to
c000219
Compare
You might have some outdated cached data in the doc directory; try deleting all of |
Looks like rebasing helped. Cleaning docs directory didn't help locally, but I can live with that for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really know the algorithm and if @mdboom is fine with it, then that's good enough for me.
Instead, I looked through the image changes and they look reasonable (though the patheffects collections change is a bit odd).
We need to be a bit careful of some renamed references when merging this and some other PRs.
scale = 5 | ||
np.random.seed(19680801) | ||
# get 15 random offsets | ||
# TODO: guarantee offset > 0 results in some offsets < 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is true at least for the given offsets and current seed:
>>> np.random.seed(19680801)
>>> np.random.rand(15) - 0.5
array([ 0.2003673 , 0.24275081, 0.20928001, 0.06674552, 0.47778533,
0.20633485, -0.25208424, -0.34211665, 0.19769852, 0.21995667,
-0.24225557, -0.15845322, 0.46876117, 0.1945071 , -0.03361674])
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep - I put that comment in as a warning to future dabblers
tutorials/01_introductory/usage.py
Outdated
# # Setup, and create the data to plot | ||
# y = np.random.rand(100000) | ||
# y[50000:] *= 2 | ||
# y[np.logspace(1,np.log10(50000),400).astype(int)] = -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add space after commas.
# interactive plotting (with maximal simplification) and another | ||
# style for publication quality plotting (with minimal | ||
# simplification) and activate them as necessary. See | ||
# :ref:`sphx_glr_tutorials_01_introductory_customizing.py` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link might get changed by #8975, so need to be wary of whichever gets merged first...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Current version of that PR would change it to sphx_glr_tutorials_introductory_customizing.py
# plt.plot(y) | ||
# plt.show() | ||
# | ||
# mpl.rcParams['path.simplify_threshold'] = 1.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This setting is ridiculously fast!
tutorials/01_introductory/usage.py
Outdated
# | ||
# The markevery argument allows for naive subsampling, or an | ||
# attempt at evenly spaced (along the *x* axis) sampling. See the | ||
# :ref:`sphx_glr_gallery_pylab_examples_markevery_demo.py` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ref will be has been outdated by #8983.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the new ref?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sphx_glr_gallery_lines_bars_and_markers_markevery_demo.py
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, that looks right.
# controlled by the ``path.simplify`` and | ||
# ``path.simplify_threshold`` parameters in your | ||
# ``matplotlibrc`` file (see | ||
# :ref:`sphx_glr_tutorials_01_introductory_customizing.py` for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link might get changed by #8975, so need to be wary of whichever gets merged first...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: Current version of that PR would change it to sphx_glr_tutorials_introductory_customizing.py
I added a commit to fix the whitespace. Lets get this in ahead of the two docs PRs if we can. |
Too late...
|
I guess I'll try rebasing on master again? Also needed to update docs ref. @tacaswell I'll keep your commit separate for now but if you want that rebased in let me know. |
Or should I hold off on rebasing since @tacaswell has approved? |
@tacaswell's commit is pretty small; I'm sure he won't mind if you rebase and/or squash it. |
Feel fee to squash. |
I meant more that the changes as they exist were already approved, but rebasing makes it harder to verify nothing's been changed since that point. Regardless, I'll rebase to see if that helps the circleCI build. That will probably necessitate also updating the links @QuLogic pointed out. |
0d66374
to
66801d7
Compare
🎉 Glad this is in! Thanks and congratulations @kbrose ! |
Woohoo! Thanks for all the help everyone. Looking forward to snappier EKG plotting! |
PR Summary
Updates the downsampling of large lines. Specifically, this PR aims to
Line2D
andPath
to actually downsample when the RC parameter indicates it should. (Addressing The RC Parampath.simplify
does nothing. #8775.)PR Checklist