Skip to content

Add new downsample method for lines #7632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 16 commits into from

Conversation

kbrose
Copy link
Contributor

@kbrose kbrose commented Dec 17, 2016

Overview

This PR adds a new downsample method for lines. This is an improved version of the algorithm found here.

When the x-values of the line are monotonic, then this downsampling method will render at most 4 times the width of the parent Axis in pixels vertices. The downsampling is done such that the rendering of the downsampled plot looks very similar to the rending of the non-downsampled plot. In fact, if there is no anti-aliasing and plotting uses a solid line only, then the renderings will be identical.

Possible Use Case(s)

I created this PR because I plot a lot of ECG data over large timespans. Typically, I might want to plot 12 or 24 hours of ECG data sampled at 125 Hz, resulting in 125 * 12 * 60 * 60 = 5400000 points of data. This is painfully slow to render, and interaction is very slow. Further, naive downsampling (e.g. plt.plot(x[::100], y[::100])) is not sufficient in this case, as artifacts of interest in this plot might only last 1 sample.

Of course, this is just a specific instance, and any plotting of large datasets could benefit.

Algorithm Description

This algorithm makes use of the fact that for large data-sets, lots of x-values will lie on the same column of pixels. We really only need to draw the minimum and maximum y-values from all of the x-values that get mapped to the same column of pixels. A little more book-keeping is required to ensure that the connecting lines between two columns of pixels render correctly, but this just amounts to keeping the first and last pairs that are mapped to that column of pixels.

An image will probably help:

legend:        +--+
        pixels |  |
               +--+
        discarded data points  o
        kept data points       x
        
+------+------+------+------+
|      |      |      |      |
|      |      |      |      |
|      |      |      |      |
|   x  |      |      |      |
+------+------+------+------+
|  o   |      | x    |      |
| o    |      |      |      |
|x     |      |      |      |
|     x|      |x     |     x|
+------+------+------+------+
|      |      |      |    o |
|      |      |      |   o  |
|      |      |  o   |  o   |
|    x |      |     x| o    |
+------+------+------+------+
|      |      |      |      |
|      |      |   xo |x     |
|      |      |      |      |
|      |      |      |      |
+------+------+------+------+

Caveats

This downsampling works best for plotting large amounts of data using a solid line and no markers.

If the x-values of the line are not monotonic, then there are no guarantees that rendering will be any faster. The "less monotonic" (for some definition of "less monotonic") the x-values are, the less downsampling that will be able to occur.

Questions

I have marked with a TODO in the code a question I had. I am recomputing self.axes.transData.transform(verts) each update (verts is the vertices of the path). Is that cached somewhere already?

TODO:

  • More complete testing when the axis is transformed (i.e. polar, etc.). This will probably end up with me just restricting the use of this downsampling to only be guaranteed to work on axes that have a set list of transformations.
  • More edge case testing.
  • Would be nice to have benchmarks.

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Dec 18, 2016
@@ -592,6 +599,59 @@ def get_markevery(self):
"""return the markevery setting"""
return self._markevery

def set_downsample(self, downsample):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also add a property for these?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @tacaswell, thanks for reviewing! This is my first time working with matplotlib code, do you mind clarifying what property you'd like me to add?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean add down sample as a property (ex downsample = property(get_downsample, set_downsample)). We are in the mid-term future moving away from all of the getter/setter methods.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Does it make sense to make the getter/setter private if I'm doing that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the get_/set_? Since this is new, can we just use the @property decorator?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm down for whatever. I was trying to use the markevery option as a guide for how to format the downsample option, but that was out of ignorance more than anything else. Just let me know what format you want it in and I'll refactor to that.

@tacaswell
Copy link
Member

Very interesting!

@tacaswell
Copy link
Member

I assume the performance win here comes from effectively dropping the anti-aliasing? I wonder if just turning it off on the line object would get you an equivalent speed up?

There already exists some path simplification code in the path drawing code (https://github.com/tacaswell/matplotlib/blob/master/src/path_converters.h#L528). How does this compare with that implementation?

@kbrose
Copy link
Contributor Author

kbrose commented Dec 18, 2016

I assume the performance win here comes from effectively dropping the anti-aliasing? I wonder if just turning it off on the line object would get you an equivalent speed up?

The speed up comes from the fact that you are rendering far fewer points (e.g. on my 1080p screen, it will render at most ~8000 line segments regardless of how many points of data there are). A short test I ran showed about 3.7 seconds to render when there was no downsampling, using both antialiased and not antialiased, compared to 0.2 seconds when there was downsampling.

import matplotlib.pyplot as plt
import numpy as np
import time
x = np.random.randn(1000000)

# antialias, no downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=True, downsample=False)
tic = time.time()
fig.savefig('antialiased_no_downsample.png')
print('antialias, no downsample', time.time() - tic)

# antialias, downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=True, downsample=True)
tic = time.time()
fig.savefig('antialiased_downsample.png')
print('antialias, downsample', time.time() - tic)

# no antialias, no downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=False, downsample=False)
tic = time.time()
fig.savefig('no_antialiased_no_downsample.png')
print('no antialias, no downsample', time.time() - tic)

# no antialias, downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=False, downsample=True)
tic = time.time()
fig.savefig('no_antialiased_downsample.png')
print('no antialias, downsample', time.time() - tic)

# Prints out the following:
# antialias, no downsample 3.771287202835083
# antialias, downsample 0.2187511920928955
# no antialias, no downsample 3.7316763401031494
# no antialias, downsample 0.218766450881958

There already exists some path simplification code in the path drawing code (https://github.com/tacaswell/matplotlib/blob/master/src/path_converters.h#L528). How does this compare with that implementation?

I'll look into that and reply.

@kbrose
Copy link
Contributor Author

kbrose commented Dec 18, 2016

There already exists some path simplification code in the path drawing code (https://github.com/tacaswell/matplotlib/blob/master/src/path_converters.h#L528). How does this compare with that implementation?

It looks like these two algorithms are accomplishing different things -- in my opinion rending would benefit from having both downsampling/simplification run (although I'm guessing the downsampling in this PR would result in the simplification not doing too much). The path simplification code linked above says

        /* idea: we can skip drawing many lines: lines < 1 pixel in
           length, and we can combine sequential parallel lines into a
           single line instead of redrawing lines over the same
           points.  The loop below works a bit like a state machine,
           where what it does depends on what it did in the last
           looping. To test whether sequential lines are close to
           parallel, I calculate the distance moved perpendicular to
           the last line. Once it gets too big, the lines cannot be
           combined. */

The idea of simplifying almost-parallel lines is completely separate from this PR. Removing line segments that are < 1 pixel in length is similar -- I would say this PR is kind of an extension of that idea, but instead of just looking at a single pixel you try to simplify the path along a single column of pixels.

Was this simplification algorithm being used in matplotlib versions 1.5.3 and lower? I haven't tested version 2.0 or higher -- if this was introduced in 2.0 or later then perhaps just using the existing path simplification algorithm would help my rendering time. But from the sound of it, there would still be some types of data that would not benefit from the simplification algorithm as much as they would benefit from the downsampling proposed in this PR.

@tacaswell
Copy link
Member

The simplification code has been in the code base longer than I have been around, but it won't help much with really spiky data.

# Find where the pixel column of the x data changes
split_indices = np.diff(np.floor(verts_trans[:, 0]).astype(int)) != 0
split_indices = np.where(split_indices)[0] + 1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe shorten the code and reduce the number of operations inside the loop (untested!):

for i, y_pixel_col in enumerate(np.split(verts_trans[:, 1], split_indices)):
    keep_inds[i, 1:3] = np.argmin(y_pixel_col), np.argmax(y_pixel_col)
starts = np.hstack((0, split_indices))
ends = np.hstack((split_indices, verts_trans.shape[0]))
keep_inds[:, :3] += starts[:, np.newaxis]
keep_inds[:, 3] = ends - 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. Done. Your code worked without modification for me.

@codecov-io
Copy link

codecov-io commented Dec 19, 2016

Current coverage is 62.07% (diff: 90.90%)

Merging #7632 into master will increase coverage by 0.02%

@@             master      #7632   diff @@
==========================================
  Files           174        174          
  Lines         56150      56182    +32   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          34844      34876    +32   
  Misses        21306      21306          
  Partials          0          0          

Powered by Codecov. Last update 63e58b3...b11984e

@kbrose
Copy link
Contributor Author

kbrose commented Dec 20, 2016

It looks like one of the SVG generation tests failed. Is this expected, or did my code break something?

@kbrose kbrose changed the title WIP: Add new downsample method for lines Add new downsample method for lines Dec 20, 2016
Copy link
Member

@NelleV NelleV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the pull request. I think this is almost ready to be merged.
Can you add an entry to the what's new documentation?

@@ -0,0 +1,21 @@
import numpy as np
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a sphinx-gallery like docstring to explain why this example has been added to the gallery?

@@ -592,6 +599,61 @@ def get_markevery(self):
"""return the markevery setting"""
return self._markevery

@property
def downsample(self):
"""Set the downsample property to subsample the plotted line segments.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current docstring is a mixture of our old and new style of docstring. Can you please stick to the new style? (I'm adding a couple notes to explain what I mean).

Recommended *only* when plotting with a solid line (e.g. no dashes,
no markers).

ACCEPTS: [True | False]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove this (it duplicates the information in the Parameters section)


Parameters
----------
downsample: True | False
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a space after the column. It else doesn't render properly with numpydoc.


Notes
-----
If the x values of the line are monotonic, then the downsampling
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is a dash missing between x and values.

@NelleV NelleV self-assigned this Dec 31, 2016
@NelleV
Copy link
Member

NelleV commented Dec 31, 2016

@kbrose I suspect the failure is unrelated to your patch.

@anntzer
Copy link
Contributor

anntzer commented Jan 8, 2017

Is there more applying downsampling than the same logic that governs whether path simplification is suitable? (i.e., plain linestyle, no markers)? I may really be missing something, but if not, it seems better to use a global switch and rely on the same logic as path simplification to choose whether to downsample individual paths.

@efiring
Copy link
Member

efiring commented Jan 8, 2017

Path simplification is general, works with any sort of path and backend; downsampling is specific to line plots of things like time series, where there are a very large number of points, x is monotonic, and y jumps up and down repeatedly over intervals of x smaller than a pixel.
I suspect the path simplification code could be modified to yield results more similar to what the downsampling does, but that would require immersion in the C++ extension code, first to understand the existing algorithm, and then to add to it.

@anntzer
Copy link
Contributor

anntzer commented Jan 9, 2017

The current path simplification algorithm already handles the given test case reasonably well:

x = np.hstack([np.zeros(99), np.ones(99)])
y = np.hstack([2.0, 8.0, 3 * np.ones(95), 1.0, 5.0, 5.0, 5.5, 6 * np.ones(95), 8.0, 7.0])
l, = plot(x, y)
print(l._path.cleaned(simplify=True))

gives

Path(array([[ 0.,  2.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  1.],
       [ 0.,  5.],
       [ 1.,  5.],
       [ 1.,  8.],
       [ 1.,  7.],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 2, 2, 2, 0], dtype=uint8))

gives

Path(array([[ 0.,  2.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  1.],
       [ 0.,  5.],
       [ 1.,  5.],
       [ 1.,  8.],
       [ 1.,  7.],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 2, 2, 2, 0], dtype=uint8))

so there's a few duplicates but I doubt they're the issue.

Rather, it appears that the path simplification code is not called at all in such a case (which can be checked by adding a std::cout << at the right place in path_converters.h).

Thus I'd rather be sure that the OP's use case cannot, indeed, already be covered by the path simplification algorithm by clarifying the logic of where it is enabled.

@kbrose
Copy link
Contributor Author

kbrose commented Jan 9, 2017

@anntzer If I understand correct, you're saying that the existing path simplification code is not actually called during plot, even if the rc parameter says to do so? Interesting, I did not know that.

Either way, if I understand your code correctly, then the following code should show that the proposed method is substantially different:

import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(1000000)

l, = plt.plot(x)

tpath, affine = l._get_transformed_path().get_transformed_path_and_affine()

print('raw path:', l._path.vertices[:,0].size)
print('current :', l._path.cleaned(simplify=True).vertices[:,0].size)
print('proposed:', l._downsample_path(tpath, affine).vertices[:,0].size)

shows the number of vertices in each of the paths

raw path: 1000000
current : 886627
proposed: 1984

@anntzer
Copy link
Contributor

anntzer commented Jan 9, 2017

I agree with your observation. However, note that path_converters.h contains the following comment:

        /* idea: we can skip drawing many lines: lines < 1 pixel in
           length, and we can combine sequential parallel lines into a
           single line instead of redrawing lines over the same
           points.  The loop below works a bit like a state machine,
           where what it does depends on what it did in the last
           looping. To test whether sequential lines are close to
           parallel, I calculate the distance moved perpendicular to
           the last line. Once it gets too big, the lines cannot be
           combined. */

I would think that such an algorithm (ignore the part about lines < 1px), if properly implemented, should have an effect very similar to what you are proposing (because combining the many ups-and-downs of the line (your proposal) is the same as squashing many parallel segments together (path simplification)), but covers more cases (e.g. when the lines are all "slanted"). Obviously it is not the case right now, but why?

I'd rather, once again, have path simplification fixed to handle your case properly, unless it is clear that this is not possible.

@kbrose
Copy link
Contributor Author

kbrose commented Jan 9, 2017

Have you seen this comment in path_converter.h?

            /* NOTE: We used to skip this very short segments, but if
               you have a lot of them cumulatively, you can miss
               maxima or minima in the data. */

            /* Don't render line segments less than one pixel long */
            /* if (fabs(*x - m_lastx) < 1.0 && fabs(*y - m_lasty) < 1.0) */
            /* { */
            /*     continue; */
            /* } */

However, even if this is done I do not know if it would be enough. It is quite possible for the data to be very "spiky", i.e. for consecutive y-values to be further than 1 pixel apart. The worst case scenario is probably having your y-data be repeated [0, 1, 0, 1, ...]. I think I mentioned this near the top, but my real-world use case is ECG data. It's more than likely (but admittedly I have not tested it) that consecutive values could be further than 1 pixel apart (and have a highly negative auto-correlation with an offset of 1).

@anntzer
Copy link
Contributor

anntzer commented Jan 9, 2017

I think your point corresponds to the "skipping lines < 1px in length" part of the comment; squashing consecutive parallel or antiparallel lines is implemented separately below (perpdNorm2). (The fact that it is able to simplify, say, plot(range(10)), shows that it does handle some lines >1px.)

@kbrose
Copy link
Contributor Author

kbrose commented Jan 9, 2017

Do we know it does or is supposed to handle anti-parallel lines? That could be the problem.

Regardless, I am satisfied that my proposal here handles my use case well, and does not cause problems elsewhere in Matplotlib. I'd not be able to say the same if I modified the C++ code (it's been quite a few years since I've touched any C-based language).

I'm just looking to get something into Matplotlib that will handle my use case ASAP. If updating the C++ algorithm would mean bumping the target version of this PR to 2.2 instead of 2.1, then I'd be in favor of keeping this proposal. If you think the C++ algorithm could be updated in time for 2.1 then that'd be even better.

Ultimately, I will leave the decision of whether this proposal should be scrapped in favor of updates to the C++ algorithm to the Matplotlib maintainers.

@anntzer
Copy link
Contributor

anntzer commented Jan 9, 2017

I think it should handle antiparallel lines as parallelism is checked using projected distance.

I understand your interest in getting your PR in, but I'd rather have the already present code fixed rather than ending with two incomplete methods, plus an additional API that's going to take forever to deprecate if we end up fixing path simplification. If you wish to use downsampling for your own projects, note that you can easily patch Line2D to include your code, by noting that you can apply the fix to the _draw_solid method (I don't think you want to patch the others, but you can, too): it currently reads

    def _draw_solid(self, renderer, gc, path, trans):
        gc.set_linestyle('solid')
        gc.set_dashes(self._dashOffset, self._dashSeq)
        renderer.draw_path(gc, path, trans)

(where trans is set to affine) so you can just have your own code do something like

def _downsample_path(path, affine): <your implementation>

_old_draw_solid = Line2D._draw_solid

def _draw_solid(self, renderer, gc, path, trans):
    if getattr(self, "downsample", False):
        path = _downsample_path(path, trans)
    return _old_draw_solid(self, renderer, gc, path, trans)

Line2D._draw_solid = _draw_solid

(I've used similar patching methods myself for other changes...)

@kbrose
Copy link
Contributor Author

kbrose commented Jan 9, 2017

From what I've pieced together, it appears as if the code intentionally will not simplify anti-parallel lines. Consider the following code:

                if (totdot > 0.0) {
                    if (paradNorm2 > m_dnorm2Max) {
                        m_lastMax = true;
                        m_dnorm2Max = paradNorm2;
                        m_nextX = *x;
                        m_nextY = *y;
                    }
                } else {
                    _push(&m_lastx, &m_lasty);
                    _push(x, y);
                    break;
                }

there, totdot is the dot product between the current path being built, and the current line segment that may or may not be appended. The part to notice here is that we check if totdot > 0.0, and if it is not (as in the anti-parallel case), we push the end of the path (m_lastx and m_lasty) along with the beginning of the new one (x and y). If my understanding is correct, this is why anti-parallel lines are not simplified.

If we actually want to go the route of modifying the C++ code, it looks like the main problem will be

  1. when we start building a new path, we immediately push the starting x,y vertex to the queue, but;
  2. if we have anti-parallel lines that shoot back past the starting x,y points (e.g. the sequence of points (0,0), (1,1), (-1,-1)), there's nothing to do except start a new path since we've already added the starting x,y to the queue.

It would be a non-trivial modification to add code to essentially mirror what's in this proposal but on almost-parallel paths instead of just almost-vertical paths; to (a) keep track of first point on the path that might get simplified, (b) find the min/max there (but in this case min/max found by treating the path you are building as the Reals), (c) keep track of last point on the path that might get simplified.

@kbrose
Copy link
Contributor Author

kbrose commented Jan 9, 2017

That being said, I see the utility in having one path simplification algorithm that returns a path whose shape is relatively preserved, e.g. for saving vector graphics files (zooming in will produce a line segment that is some-what close to what it would be in the non-simplified version) and another path simplification that will destroy shape (mainly used for rasterization). These could definitely be the same method, just parameterized.

@anntzer
Copy link
Contributor

anntzer commented Jan 9, 2017

Good catch.

I went through the current path simplification algorithm and I think something like the following "should" work:

  • push the first two vertices as begin and end of the candidate segment.
  • iterating over vertices from the third...
    • if the vertex is "close" (in perp. distance) to the current candidate segment, merge it into the segment, by lengthening, if necessary, the candidate segment up to the projection of the vertex on the segment (i.e., the original segment never changes in direction, in order to avoid slow drift accumulation).
    • otherwise, emit the current candidate segment, and restart from the previous vertex.
  • do not forget to emit the last segment in any case.

@kbrose
Copy link
Contributor Author

kbrose commented Jan 23, 2017

@NelleV @efiring Suggestions on what I can do to keep this PR moving?

@efiring
Copy link
Member

efiring commented Jan 23, 2017

@anntzer do you think you will be able to follow up on your ideas for achieving a similar gain via modifications to the path simplification code?

@anntzer
Copy link
Contributor

anntzer commented Jan 23, 2017

Not right now (real life has caught up, and this is a nontrivial project). Unless this turns out much easier to do than expected I don't think I'll have time to work on it before, say, April.

A quick googling pointed out that, not unexpectedly, this is a well-studied problem. See e.g. https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm -- there are five other algorithms listed at the bottom. Whether any of them can handle the use case of the OP, I don't know, but I'd suggest we take advantage of the existing algorithms rather than our home-cooked thing. Perhaps make this a GSOC? :) (attn @tacaswell)

I understand that the usage of the OP is a fairly common one, and the code is already there, so I won't oppose merging something like that. However, in that case I'd still suggest that control of whether the procedure is applied be folded into the path.simplify rcparam (with internal testing that the linestyle is indeed solid) rather than a kwarg on plot (if it does become a kwarg on plot, it should also be named simplify, as the downsampling is actually "adaptive", whereas the word "downsample" sounds, to me at least, as if it uses a fixed factor), as it would 1) probably only entail a very marginal slowdown for lines that do not satisfy the required conditions, 2) make things much easier for the user, and 3) allow us to swap in a better simplification algorithm later without worrying about backcompatibility.

@kbrose
Copy link
Contributor Author

kbrose commented Jan 27, 2017

@anntzer
What is GSOC?

The algorithm proposed here is O(n) time and, if the x-values are monotonic, constant (determined by resolution of rendering) space. If the x-values are not monotonic, the space is at most O(n).

The Ramer–Douglas–Peucker algorithm is expected O(n log(n)) and worst case O(n^2), and it looks like the case of highly negative one-step autocorrelation (e.g. [-1, 1, -1, 1, ...]) will be closer to O(n^2) than to O(n log(n)).

The first additional algorithm (Visvalingam–Whyatt) listed on the wikipedia page has a good description here. Since it depends on the area of triangles, it looks like it too is not well suited to datasets with highly negative one-step autocorrelation. Consider my prototypical example of [-1, 1, -1, 1, ...]. If the height of your axis was 100 pixels (quite small!), then setting your simplification parameter to less than 1 square pixel would mean that you could end up with 100 lines per column of pixels, compared to the maximum of 4 lines per column of pixels in the proposed algorithm. I have not investigated the other methods on that list.

I do not care strongly about the specifics of how the downsampling is turned on, or what it is called.

Also, congratulations, everyone, on releasing 2.0! Thanks for all your hard work!

@tacaswell
Copy link
Member

GSoC -> google summer of code.

@kbrose
Copy link
Contributor Author

kbrose commented Feb 10, 2017

Sorry to keep pestering, but are there any thoughts on what to do with this?

@dopplershift
Copy link
Contributor

(I'm unable to comment on this in any depth, but wanted to clarify that your pinging on this after two weeks of silence is completely appropriate.)

@kbrose
Copy link
Contributor Author

kbrose commented Feb 18, 2017

@efiring @anntzer @NelleV
Thoughts?

@anntzer
Copy link
Contributor

anntzer commented Feb 18, 2017

As mentioned earlier I do not consider that this PR should be merged (based on technical grounds -- do not take this personally :-)). If other devs want to push for it I would appreciate if my comment above (which I reproduce below) could be taken into account.

I'd still suggest that control of whether the procedure is applied be folded into the path.simplify rcparam (with internal testing that the linestyle is indeed solid) rather than a kwarg on plot (if it does become a kwarg on plot, it should also be named simplify, as the downsampling is actually "adaptive", whereas the word "downsample" sounds, to me at least, as if it uses a fixed factor), as it would 1) probably only entail a very marginal slowdown for lines that do not satisfy the required conditions, 2) make things much easier for the user, and 3) allow us to swap in a better simplification algorithm later without worrying about backcompatibility.

@tacaswell
Copy link
Member

I agree this should be merged with the existing path simplification, or at least put on the Path object. This will allow everything based on paths to benefit from this!

@kbrose
Copy link
Contributor Author

kbrose commented Mar 5, 2017

Ok. I have a couple of questions before I attempt anything else. Apologies in advance for all the notifications I'm about to generate. I want to get everyone's input before putting more work into this.

Is everyone in agreement that modifying the existing path simplification code is the way to go? To be clear, I'd be modifying the algorithm as @anntzer and I discussed above -- essentially removing the current constraint that in order for a vertex to be removed, it must have a non-negative dot product with the currently proposed simplification vector.

  • Would anyone consider it a backwards-incompatible change to modify the output of Path.cleaned()?
  • Does anyone see a reason to keep one version of path simplification that only removes vertices pointing in the same direction as the current vector we are simplifying and another that does not? (I can expound more on this if I need to - but see the above comments between @anntzer and I.)
  • @anntzer voted for using a published algorithm. AFAIK neither the current code nor the proposed modifications I'd be making are equivalent to a published algorithm. Any objections to that?
    • I by no means did a complete literature search, but two of the more prominent published algorithms were not suitable for our use (see comments above).
  • Where is the path simplification code currently used? I see that you can get the simplified path by calling Path.cleaned() directly, but we found out when discussing this PR that the RC parameter path.simplify does not trigger any simplification when matplotlib.pyplot.plot is called, for example.
    • Relatedly, does anyone consider it a backwards-incompatible change to change the behavior of the RC parameter path.simplify doing nothing on a call to plot to doing something on a call to plot?
  • How should @efiring's concerns around rasterized vs. vector backends be addressed? When we were discussing this a few months ago, I think we came to the conclusion that we should add a __rasterized attribute to backends and key off of that. However, there was never any good conclusion on how to handle simplification on vector backends since the DPI shown to the Path object is usually meaningless - and I think the only idea we did have (passing in a number > 1 to the downsample parameter could be interpreted as the desired DPI) would not work if using the C++ code.
    • I'd be OK preventing any implicit simplification on vector backends for now despite the utility it could provide. I think it's a large complication for diminishing returns.
  • Anything else anyone wants to comment on?

@tacaswell
@efiring
@anntzer
@NelleV

@anntzer
Copy link
Contributor

anntzer commented Mar 5, 2017

  1. It would by definition be backwards incompatible, but I'm fine with that (others may not, though...).
  2. No.
  3. The current algorithm actually looks fairly close to Reumann-Witkam (http://psimpl.sourceforge.net/reumann-witkam.html, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.5882&rep=rep1&type=pdf). In a quick search, I haven't found anything that seems to handle backwards-moving paths, so that may be that...
  4. That may be a bug, and I'd vote in favor of having the path.simplify rcparam turn on, well, path simplification (otherwise, what does it do?).
  5. I think we can do without simplification on vector backends for now. It can always be added it later if it turns out there's real demand for it.

@kbrose
Copy link
Contributor Author

kbrose commented Apr 26, 2017

Does anyone else have any comments? I now have Linux installed once again so at least I can build from source.

But I am wondering -- I understand wanting to keep source code clean and wanting to avoid having two very similar ways to accomplish something implemented in different places. But if the C++ code is not actually used anywhere, why not just swap it out with a Python implementation? If it's found to not be fast enough it could always be implemented in C++. So I'm asking, how would others feel about updating the proposed algorithm here to handle more than just vertical lines (essentially the C++ algorithm but handling anti-parallel lines) in Python, and deprecating the C++ implementation algorithm?

I'm willing to put work into this, but am wary to move forward without being relatively certain my efforts will be worth it.

@anntzer
Copy link
Contributor

anntzer commented Apr 26, 2017

I doubt that a pure Python implementation can be fast enough, but would certainly be willing to switch to one instead of the current C++ implementation if it handles more cases and is actually reasonably fast.

@kbrose
Copy link
Contributor Author

kbrose commented Apr 26, 2017

My current implementation is Python (not pure Python -- it uses numpy of course) and has been fast enough for me as I've been using it the past few months (thanks to your Monkey Patch suggestion!). It's orders of magnitude better than the what currently exists...

@kbrose kbrose mentioned this pull request Jun 18, 2017
6 tasks
@kbrose
Copy link
Contributor Author

kbrose commented Jun 18, 2017

Closing in favor of #8776.

Thanks for all the comments everyone.

@kbrose kbrose closed this Jun 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants