Add new downsample method for lines #7632

kbrose · 2016-12-17T22:59:57Z

Overview

This PR adds a new downsample method for lines. This is an improved version of the algorithm found here.

When the x-values of the line are monotonic, then this downsampling method will render at most 4 times the width of the parent Axis in pixels vertices. The downsampling is done such that the rendering of the downsampled plot looks very similar to the rending of the non-downsampled plot. In fact, if there is no anti-aliasing and plotting uses a solid line only, then the renderings will be identical.

Possible Use Case(s)

I created this PR because I plot a lot of ECG data over large timespans. Typically, I might want to plot 12 or 24 hours of ECG data sampled at 125 Hz, resulting in 125 * 12 * 60 * 60 = 5400000 points of data. This is painfully slow to render, and interaction is very slow. Further, naive downsampling (e.g. plt.plot(x[::100], y[::100])) is not sufficient in this case, as artifacts of interest in this plot might only last 1 sample.

Of course, this is just a specific instance, and any plotting of large datasets could benefit.

Algorithm Description

This algorithm makes use of the fact that for large data-sets, lots of x-values will lie on the same column of pixels. We really only need to draw the minimum and maximum y-values from all of the x-values that get mapped to the same column of pixels. A little more book-keeping is required to ensure that the connecting lines between two columns of pixels render correctly, but this just amounts to keeping the first and last pairs that are mapped to that column of pixels.

An image will probably help:

legend:        +--+
        pixels |  |
               +--+
        discarded data points  o
        kept data points       x
        
+------+------+------+------+
|      |      |      |      |
|      |      |      |      |
|      |      |      |      |
|   x  |      |      |      |
+------+------+------+------+
|  o   |      | x    |      |
| o    |      |      |      |
|x     |      |      |      |
|     x|      |x     |     x|
+------+------+------+------+
|      |      |      |    o |
|      |      |      |   o  |
|      |      |  o   |  o   |
|    x |      |     x| o    |
+------+------+------+------+
|      |      |      |      |
|      |      |   xo |x     |
|      |      |      |      |
|      |      |      |      |
+------+------+------+------+

Caveats

This downsampling works best for plotting large amounts of data using a solid line and no markers.

If the x-values of the line are not monotonic, then there are no guarantees that rendering will be any faster. The "less monotonic" (for some definition of "less monotonic") the x-values are, the less downsampling that will be able to occur.

Questions

I have marked with a TODO in the code a question I had. I am recomputing self.axes.transData.transform(verts) each update (verts is the vertices of the path). Is that cached somewhere already?

TODO:

More complete testing when the axis is transformed (i.e. polar, etc.). This will probably end up with me just restricting the use of this downsampling to only be guaranteed to work on axes that have a set list of transformations.
More edge case testing.
Would be nice to have benchmarks.

tacaswell · 2016-12-18T04:31:52Z

lib/matplotlib/lines.py

@@ -592,6 +599,59 @@ def get_markevery(self):
        """return the markevery setting"""
        return self._markevery

+    def set_downsample(self, downsample):


Can you also add a property for these?

Hey @tacaswell, thanks for reviewing! This is my first time working with matplotlib code, do you mind clarifying what property you'd like me to add?

I mean add down sample as a property (ex downsample = property(get_downsample, set_downsample)). We are in the mid-term future moving away from all of the getter/setter methods.

Ok. Does it make sense to make the getter/setter private if I'm doing that?

Do we need the get_/set_? Since this is new, can we just use the @property decorator?

I'm down for whatever. I was trying to use the markevery option as a guide for how to format the downsample option, but that was out of ignorance more than anything else. Just let me know what format you want it in and I'll refactor to that.

tacaswell · 2016-12-18T04:34:21Z

Very interesting!

tacaswell · 2016-12-18T15:14:25Z

I assume the performance win here comes from effectively dropping the anti-aliasing? I wonder if just turning it off on the line object would get you an equivalent speed up?

There already exists some path simplification code in the path drawing code (https://github.com/tacaswell/matplotlib/blob/master/src/path_converters.h#L528). How does this compare with that implementation?

kbrose · 2016-12-18T19:45:58Z

I assume the performance win here comes from effectively dropping the anti-aliasing? I wonder if just turning it off on the line object would get you an equivalent speed up?

The speed up comes from the fact that you are rendering far fewer points (e.g. on my 1080p screen, it will render at most ~8000 line segments regardless of how many points of data there are). A short test I ran showed about 3.7 seconds to render when there was no downsampling, using both antialiased and not antialiased, compared to 0.2 seconds when there was downsampling.

import matplotlib.pyplot as plt
import numpy as np
import time
x = np.random.randn(1000000)

# antialias, no downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=True, downsample=False)
tic = time.time()
fig.savefig('antialiased_no_downsample.png')
print('antialias, no downsample', time.time() - tic)

# antialias, downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=True, downsample=True)
tic = time.time()
fig.savefig('antialiased_downsample.png')
print('antialias, downsample', time.time() - tic)

# no antialias, no downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=False, downsample=False)
tic = time.time()
fig.savefig('no_antialiased_no_downsample.png')
print('no antialias, no downsample', time.time() - tic)

# no antialias, downsample
fig, ax = plt.subplots()
ax.plot(x, antialiased=False, downsample=True)
tic = time.time()
fig.savefig('no_antialiased_downsample.png')
print('no antialias, downsample', time.time() - tic)

# Prints out the following:
# antialias, no downsample 3.771287202835083
# antialias, downsample 0.2187511920928955
# no antialias, no downsample 3.7316763401031494
# no antialias, downsample 0.218766450881958

There already exists some path simplification code in the path drawing code (https://github.com/tacaswell/matplotlib/blob/master/src/path_converters.h#L528). How does this compare with that implementation?

I'll look into that and reply.

kbrose · 2016-12-18T20:09:31Z

There already exists some path simplification code in the path drawing code (https://github.com/tacaswell/matplotlib/blob/master/src/path_converters.h#L528). How does this compare with that implementation?

It looks like these two algorithms are accomplishing different things -- in my opinion rending would benefit from having both downsampling/simplification run (although I'm guessing the downsampling in this PR would result in the simplification not doing too much). The path simplification code linked above says

        /* idea: we can skip drawing many lines: lines < 1 pixel in
           length, and we can combine sequential parallel lines into a
           single line instead of redrawing lines over the same
           points.  The loop below works a bit like a state machine,
           where what it does depends on what it did in the last
           looping. To test whether sequential lines are close to
           parallel, I calculate the distance moved perpendicular to
           the last line. Once it gets too big, the lines cannot be
           combined. */

The idea of simplifying almost-parallel lines is completely separate from this PR. Removing line segments that are < 1 pixel in length is similar -- I would say this PR is kind of an extension of that idea, but instead of just looking at a single pixel you try to simplify the path along a single column of pixels.

Was this simplification algorithm being used in matplotlib versions 1.5.3 and lower? I haven't tested version 2.0 or higher -- if this was introduced in 2.0 or later then perhaps just using the existing path simplification algorithm would help my rendering time. But from the sound of it, there would still be some types of data that would not benefit from the simplification algorithm as much as they would benefit from the downsampling proposed in this PR.

tacaswell · 2016-12-18T20:20:31Z

The simplification code has been in the code base longer than I have been around, but it won't help much with really spiky data.

efiring · 2016-12-19T01:28:06Z

lib/matplotlib/lines.py

+        # Find where the pixel column of the x data changes
+        split_indices = np.diff(np.floor(verts_trans[:, 0]).astype(int)) != 0
+        split_indices = np.where(split_indices)[0] + 1
+


Maybe shorten the code and reduce the number of operations inside the loop (untested!):

for i, y_pixel_col in enumerate(np.split(verts_trans[:, 1], split_indices)): keep_inds[i, 1:3] = np.argmin(y_pixel_col), np.argmax(y_pixel_col) starts = np.hstack((0, split_indices)) ends = np.hstack((split_indices, verts_trans.shape[0])) keep_inds[:, :3] += starts[:, np.newaxis] keep_inds[:, 3] = ends - 1

Good idea. Done. Your code worked without modification for me.

codecov-io · 2016-12-19T03:27:05Z

Current coverage is 62.07% (diff: 90.90%)

Merging #7632 into master will increase coverage by 0.02%

@@             master      #7632   diff @@
==========================================
  Files           174        174          
  Lines         56150      56182    +32   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          34844      34876    +32   
  Misses        21306      21306          
  Partials          0          0

Powered by Codecov. Last update 63e58b3...b11984e

kbrose · 2016-12-20T03:00:38Z

It looks like one of the SVG generation tests failed. Is this expected, or did my code break something?

NelleV

Thanks for the pull request. I think this is almost ready to be merged.
Can you add an entry to the what's new documentation?

NelleV · 2016-12-31T08:58:40Z

examples/pyplots/pyplot_downsample.py

@@ -0,0 +1,21 @@
+import numpy as np


Can you add a sphinx-gallery like docstring to explain why this example has been added to the gallery?

NelleV · 2016-12-31T09:00:02Z

lib/matplotlib/lines.py

@@ -592,6 +599,61 @@ def get_markevery(self):
        """return the markevery setting"""
        return self._markevery

+    @property
+    def downsample(self):
+        """Set the downsample property to subsample the plotted line segments.


The current docstring is a mixture of our old and new style of docstring. Can you please stick to the new style? (I'm adding a couple notes to explain what I mean).

NelleV · 2016-12-31T09:00:32Z

lib/matplotlib/lines.py

+        Recommended *only* when plotting with a solid line (e.g. no dashes,
+        no markers).
+
+        ACCEPTS: [True | False]


Remove this (it duplicates the information in the Parameters section)

NelleV · 2016-12-31T09:00:51Z

lib/matplotlib/lines.py

+
+        Parameters
+        ----------
+        downsample: True | False


can you add a space after the column. It else doesn't render properly with numpydoc.

NelleV · 2016-12-31T09:01:39Z

lib/matplotlib/lines.py

+
+        Notes
+        -----
+        If the x values of the line are monotonic, then the downsampling


I think there is a dash missing between x and values.

NelleV · 2016-12-31T09:06:41Z

@kbrose I suspect the failure is unrelated to your patch.

anntzer · 2017-01-08T21:24:25Z

Is there more applying downsampling than the same logic that governs whether path simplification is suitable? (i.e., plain linestyle, no markers)? I may really be missing something, but if not, it seems better to use a global switch and rely on the same logic as path simplification to choose whether to downsample individual paths.

efiring · 2017-01-08T23:10:56Z

Path simplification is general, works with any sort of path and backend; downsampling is specific to line plots of things like time series, where there are a very large number of points, x is monotonic, and y jumps up and down repeatedly over intervals of x smaller than a pixel.
I suspect the path simplification code could be modified to yield results more similar to what the downsampling does, but that would require immersion in the C++ extension code, first to understand the existing algorithm, and then to add to it.

anntzer · 2017-01-09T00:06:44Z

The current path simplification algorithm already handles the given test case reasonably well:

x = np.hstack([np.zeros(99), np.ones(99)])
y = np.hstack([2.0, 8.0, 3 * np.ones(95), 1.0, 5.0, 5.0, 5.5, 6 * np.ones(95), 8.0, 7.0])
l, = plot(x, y)
print(l._path.cleaned(simplify=True))

gives

Path(array([[ 0.,  2.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  1.],
       [ 0.,  5.],
       [ 1.,  5.],
       [ 1.,  8.],
       [ 1.,  7.],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 2, 2, 2, 0], dtype=uint8))

gives

Path(array([[ 0.,  2.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  8.],
       [ 0.,  1.],
       [ 0.,  5.],
       [ 1.,  5.],
       [ 1.,  8.],
       [ 1.,  7.],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 2, 2, 2, 0], dtype=uint8))

so there's a few duplicates but I doubt they're the issue.

Rather, it appears that the path simplification code is not called at all in such a case (which can be checked by adding a std::cout << at the right place in path_converters.h).

Thus I'd rather be sure that the OP's use case cannot, indeed, already be covered by the path simplification algorithm by clarifying the logic of where it is enabled.

kbrose · 2017-01-09T01:50:51Z

@anntzer If I understand correct, you're saying that the existing path simplification code is not actually called during plot, even if the rc parameter says to do so? Interesting, I did not know that.

Either way, if I understand your code correctly, then the following code should show that the proposed method is substantially different:

import matplotlib.pyplot as plt
import numpy as np
x = np.random.rand(1000000)

l, = plt.plot(x)

tpath, affine = l._get_transformed_path().get_transformed_path_and_affine()

print('raw path:', l._path.vertices[:,0].size)
print('current :', l._path.cleaned(simplify=True).vertices[:,0].size)
print('proposed:', l._downsample_path(tpath, affine).vertices[:,0].size)

shows the number of vertices in each of the paths

raw path: 1000000
current : 886627
proposed: 1984

anntzer · 2017-01-09T02:03:26Z

I agree with your observation. However, note that path_converters.h contains the following comment:

        /* idea: we can skip drawing many lines: lines < 1 pixel in
           length, and we can combine sequential parallel lines into a
           single line instead of redrawing lines over the same
           points.  The loop below works a bit like a state machine,
           where what it does depends on what it did in the last
           looping. To test whether sequential lines are close to
           parallel, I calculate the distance moved perpendicular to
           the last line. Once it gets too big, the lines cannot be
           combined. */

I would think that such an algorithm (ignore the part about lines < 1px), if properly implemented, should have an effect very similar to what you are proposing (because combining the many ups-and-downs of the line (your proposal) is the same as squashing many parallel segments together (path simplification)), but covers more cases (e.g. when the lines are all "slanted"). Obviously it is not the case right now, but why?

I'd rather, once again, have path simplification fixed to handle your case properly, unless it is clear that this is not possible.

kbrose · 2017-01-09T02:10:24Z

Have you seen this comment in path_converter.h?

            /* NOTE: We used to skip this very short segments, but if
               you have a lot of them cumulatively, you can miss
               maxima or minima in the data. */

            /* Don't render line segments less than one pixel long */
            /* if (fabs(*x - m_lastx) < 1.0 && fabs(*y - m_lasty) < 1.0) */
            /* { */
            /*     continue; */
            /* } */

However, even if this is done I do not know if it would be enough. It is quite possible for the data to be very "spiky", i.e. for consecutive y-values to be further than 1 pixel apart. The worst case scenario is probably having your y-data be repeated [0, 1, 0, 1, ...]. I think I mentioned this near the top, but my real-world use case is ECG data. It's more than likely (but admittedly I have not tested it) that consecutive values could be further than 1 pixel apart (and have a highly negative auto-correlation with an offset of 1).

anntzer · 2017-01-09T02:14:33Z

I think your point corresponds to the "skipping lines < 1px in length" part of the comment; squashing consecutive parallel or antiparallel lines is implemented separately below (perpdNorm2). (The fact that it is able to simplify, say, plot(range(10)), shows that it does handle some lines >1px.)

kbrose · 2017-01-09T02:36:37Z

Do we know it does or is supposed to handle anti-parallel lines? That could be the problem.

Regardless, I am satisfied that my proposal here handles my use case well, and does not cause problems elsewhere in Matplotlib. I'd not be able to say the same if I modified the C++ code (it's been quite a few years since I've touched any C-based language).

I'm just looking to get something into Matplotlib that will handle my use case ASAP. If updating the C++ algorithm would mean bumping the target version of this PR to 2.2 instead of 2.1, then I'd be in favor of keeping this proposal. If you think the C++ algorithm could be updated in time for 2.1 then that'd be even better.

Ultimately, I will leave the decision of whether this proposal should be scrapped in favor of updates to the C++ algorithm to the Matplotlib maintainers.

anntzer · 2017-01-09T03:02:49Z

I think it should handle antiparallel lines as parallelism is checked using projected distance.

I understand your interest in getting your PR in, but I'd rather have the already present code fixed rather than ending with two incomplete methods, plus an additional API that's going to take forever to deprecate if we end up fixing path simplification. If you wish to use downsampling for your own projects, note that you can easily patch Line2D to include your code, by noting that you can apply the fix to the _draw_solid method (I don't think you want to patch the others, but you can, too): it currently reads

    def _draw_solid(self, renderer, gc, path, trans):
        gc.set_linestyle('solid')
        gc.set_dashes(self._dashOffset, self._dashSeq)
        renderer.draw_path(gc, path, trans)

(where trans is set to affine) so you can just have your own code do something like

def _downsample_path(path, affine): <your implementation>

_old_draw_solid = Line2D._draw_solid

def _draw_solid(self, renderer, gc, path, trans):
    if getattr(self, "downsample", False):
        path = _downsample_path(path, trans)
    return _old_draw_solid(self, renderer, gc, path, trans)

Line2D._draw_solid = _draw_solid

(I've used similar patching methods myself for other changes...)

kbrose · 2017-01-09T07:44:57Z

From what I've pieced together, it appears as if the code intentionally will not simplify anti-parallel lines. Consider the following code:

                if (totdot > 0.0) {
                    if (paradNorm2 > m_dnorm2Max) {
                        m_lastMax = true;
                        m_dnorm2Max = paradNorm2;
                        m_nextX = *x;
                        m_nextY = *y;
                    }
                } else {
                    _push(&m_lastx, &m_lasty);
                    _push(x, y);
                    break;
                }

there, totdot is the dot product between the current path being built, and the current line segment that may or may not be appended. The part to notice here is that we check if totdot > 0.0, and if it is not (as in the anti-parallel case), we push the end of the path (m_lastx and m_lasty) along with the beginning of the new one (x and y). If my understanding is correct, this is why anti-parallel lines are not simplified.

If we actually want to go the route of modifying the C++ code, it looks like the main problem will be

when we start building a new path, we immediately push the starting x,y vertex to the queue, but;
if we have anti-parallel lines that shoot back past the starting x,y points (e.g. the sequence of points (0,0), (1,1), (-1,-1)), there's nothing to do except start a new path since we've already added the starting x,y to the queue.

It would be a non-trivial modification to add code to essentially mirror what's in this proposal but on almost-parallel paths instead of just almost-vertical paths; to (a) keep track of first point on the path that might get simplified, (b) find the min/max there (but in this case min/max found by treating the path you are building as the Reals), (c) keep track of last point on the path that might get simplified.

kbrose · 2017-01-09T08:04:52Z

That being said, I see the utility in having one path simplification algorithm that returns a path whose shape is relatively preserved, e.g. for saving vector graphics files (zooming in will produce a line segment that is some-what close to what it would be in the non-simplified version) and another path simplification that will destroy shape (mainly used for rasterization). These could definitely be the same method, just parameterized.

anntzer · 2017-01-09T08:15:00Z

Good catch.

I went through the current path simplification algorithm and I think something like the following "should" work:

push the first two vertices as begin and end of the candidate segment.
iterating over vertices from the third...
- if the vertex is "close" (in perp. distance) to the current candidate segment, merge it into the segment, by lengthening, if necessary, the candidate segment up to the projection of the vertex on the segment (i.e., the original segment never changes in direction, in order to avoid slow drift accumulation).
- otherwise, emit the current candidate segment, and restart from the previous vertex.
do not forget to emit the last segment in any case.

kbrose · 2017-01-23T06:58:46Z

@NelleV @efiring Suggestions on what I can do to keep this PR moving?

efiring · 2017-01-23T07:13:00Z

@anntzer do you think you will be able to follow up on your ideas for achieving a similar gain via modifications to the path simplification code?

anntzer · 2017-01-23T07:45:15Z

Not right now (real life has caught up, and this is a nontrivial project). Unless this turns out much easier to do than expected I don't think I'll have time to work on it before, say, April.

A quick googling pointed out that, not unexpectedly, this is a well-studied problem. See e.g. https://en.wikipedia.org/wiki/Ramer%E2%80%93Douglas%E2%80%93Peucker_algorithm -- there are five other algorithms listed at the bottom. Whether any of them can handle the use case of the OP, I don't know, but I'd suggest we take advantage of the existing algorithms rather than our home-cooked thing. Perhaps make this a GSOC? :) (attn @tacaswell)

I understand that the usage of the OP is a fairly common one, and the code is already there, so I won't oppose merging something like that. However, in that case I'd still suggest that control of whether the procedure is applied be folded into the path.simplify rcparam (with internal testing that the linestyle is indeed solid) rather than a kwarg on plot (if it does become a kwarg on plot, it should also be named simplify, as the downsampling is actually "adaptive", whereas the word "downsample" sounds, to me at least, as if it uses a fixed factor), as it would 1) probably only entail a very marginal slowdown for lines that do not satisfy the required conditions, 2) make things much easier for the user, and 3) allow us to swap in a better simplification algorithm later without worrying about backcompatibility.

kbrose · 2017-01-27T03:28:44Z

@anntzer
What is GSOC?

The algorithm proposed here is O(n) time and, if the x-values are monotonic, constant (determined by resolution of rendering) space. If the x-values are not monotonic, the space is at most O(n).

The Ramer–Douglas–Peucker algorithm is expected O(n log(n)) and worst case O(n^2), and it looks like the case of highly negative one-step autocorrelation (e.g. [-1, 1, -1, 1, ...]) will be closer to O(n^2) than to O(n log(n)).

The first additional algorithm (Visvalingam–Whyatt) listed on the wikipedia page has a good description here. Since it depends on the area of triangles, it looks like it too is not well suited to datasets with highly negative one-step autocorrelation. Consider my prototypical example of [-1, 1, -1, 1, ...]. If the height of your axis was 100 pixels (quite small!), then setting your simplification parameter to less than 1 square pixel would mean that you could end up with 100 lines per column of pixels, compared to the maximum of 4 lines per column of pixels in the proposed algorithm. I have not investigated the other methods on that list.

I do not care strongly about the specifics of how the downsampling is turned on, or what it is called.

Also, congratulations, everyone, on releasing 2.0! Thanks for all your hard work!

tacaswell · 2017-01-27T04:06:26Z

GSoC -> google summer of code.

kbrose · 2017-02-10T01:42:39Z

Sorry to keep pestering, but are there any thoughts on what to do with this?

dopplershift · 2017-02-10T20:24:14Z

(I'm unable to comment on this in any depth, but wanted to clarify that your pinging on this after two weeks of silence is completely appropriate.)

kbrose · 2017-02-18T08:00:39Z

@efiring @anntzer @NelleV
Thoughts?

anntzer · 2017-02-18T09:02:42Z

As mentioned earlier I do not consider that this PR should be merged (based on technical grounds -- do not take this personally :-)). If other devs want to push for it I would appreciate if my comment above (which I reproduce below) could be taken into account.

I'd still suggest that control of whether the procedure is applied be folded into the path.simplify rcparam (with internal testing that the linestyle is indeed solid) rather than a kwarg on plot (if it does become a kwarg on plot, it should also be named simplify, as the downsampling is actually "adaptive", whereas the word "downsample" sounds, to me at least, as if it uses a fixed factor), as it would 1) probably only entail a very marginal slowdown for lines that do not satisfy the required conditions, 2) make things much easier for the user, and 3) allow us to swap in a better simplification algorithm later without worrying about backcompatibility.

tacaswell · 2017-02-20T22:32:31Z

I agree this should be merged with the existing path simplification, or at least put on the Path object. This will allow everything based on paths to benefit from this!

kbrose · 2017-03-05T18:02:26Z

Ok. I have a couple of questions before I attempt anything else. Apologies in advance for all the notifications I'm about to generate. I want to get everyone's input before putting more work into this.

Is everyone in agreement that modifying the existing path simplification code is the way to go? To be clear, I'd be modifying the algorithm as @anntzer and I discussed above -- essentially removing the current constraint that in order for a vertex to be removed, it must have a non-negative dot product with the currently proposed simplification vector.

Would anyone consider it a backwards-incompatible change to modify the output of Path.cleaned()?
Does anyone see a reason to keep one version of path simplification that only removes vertices pointing in the same direction as the current vector we are simplifying and another that does not? (I can expound more on this if I need to - but see the above comments between @anntzer and I.)
@anntzer voted for using a published algorithm. AFAIK neither the current code nor the proposed modifications I'd be making are equivalent to a published algorithm. Any objections to that?
- I by no means did a complete literature search, but two of the more prominent published algorithms were not suitable for our use (see comments above).
Where is the path simplification code currently used? I see that you can get the simplified path by calling Path.cleaned() directly, but we found out when discussing this PR that the RC parameter path.simplify does not trigger any simplification when matplotlib.pyplot.plot is called, for example.
- Relatedly, does anyone consider it a backwards-incompatible change to change the behavior of the RC parameter path.simplify doing nothing on a call to plot to doing something on a call to plot?
How should @efiring's concerns around rasterized vs. vector backends be addressed? When we were discussing this a few months ago, I think we came to the conclusion that we should add a __rasterized attribute to backends and key off of that. However, there was never any good conclusion on how to handle simplification on vector backends since the DPI shown to the Path object is usually meaningless - and I think the only idea we did have (passing in a number > 1 to the downsample parameter could be interpreted as the desired DPI) would not work if using the C++ code.
- I'd be OK preventing any implicit simplification on vector backends for now despite the utility it could provide. I think it's a large complication for diminishing returns.
Anything else anyone wants to comment on?

@tacaswell
@efiring
@anntzer
@NelleV

anntzer · 2017-03-05T19:47:26Z

It would by definition be backwards incompatible, but I'm fine with that (others may not, though...).
No.
The current algorithm actually looks fairly close to Reumann-Witkam (http://psimpl.sourceforge.net/reumann-witkam.html, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.95.5882&rep=rep1&type=pdf). In a quick search, I haven't found anything that seems to handle backwards-moving paths, so that may be that...
That may be a bug, and I'd vote in favor of having the path.simplify rcparam turn on, well, path simplification (otherwise, what does it do?).
I think we can do without simplification on vector backends for now. It can always be added it later if it turns out there's real demand for it.

kbrose · 2017-04-26T17:57:46Z

Does anyone else have any comments? I now have Linux installed once again so at least I can build from source.

But I am wondering -- I understand wanting to keep source code clean and wanting to avoid having two very similar ways to accomplish something implemented in different places. But if the C++ code is not actually used anywhere, why not just swap it out with a Python implementation? If it's found to not be fast enough it could always be implemented in C++. So I'm asking, how would others feel about updating the proposed algorithm here to handle more than just vertical lines (essentially the C++ algorithm but handling anti-parallel lines) in Python, and deprecating the C++ implementation algorithm?

I'm willing to put work into this, but am wary to move forward without being relatively certain my efforts will be worth it.

anntzer · 2017-04-26T18:12:15Z

I doubt that a pure Python implementation can be fast enough, but would certainly be willing to switch to one instead of the current C++ implementation if it handles more cases and is actually reasonably fast.

kbrose · 2017-04-26T18:17:09Z

My current implementation is Python (not pure Python -- it uses numpy of course) and has been fast enough for me as I've been using it the past few months (thanks to your Monkey Patch suggestion!). It's orders of magnitude better than the what currently exists...

kbrose · 2017-06-18T19:45:26Z

Closing in favor of #8776.

Thanks for all the comments everyone.

kbrose added 6 commits December 17, 2016 13:07

untested proof of concept

90692fe

renaming to avoid name clash with reduce

c956349

initializing value of downsample

3cb6dd0

refactoring to bring _downsample_path into Line

e55b107

adding test

99883ed

adding example

624786b

tacaswell added this to the 2.1 (next point release) milestone Dec 18, 2016

tacaswell reviewed Dec 18, 2016

View reviewed changes

fixing nose test for np array equality

b6c4b2c

kbrose force-pushed the downsample-line branch from 1a89dd0 to b6c4b2c Compare December 18, 2016 08:30

pep8 conformance

86f1612

adding property, clean up docstring

e6b62a1

efiring reviewed Dec 19, 2016

View reviewed changes

simplifying loop per suggestion from @efiring

b11984e

using property decorator, works for non-linear axes scales

445a051

kbrose force-pushed the downsample-line branch from f134cc7 to 445a051 Compare December 19, 2016 04:08

guarding against NaN y-values and non-monotonic x-values

d460c90

updating test to reflect changes

5a9bc7a

kbrose force-pushed the downsample-line branch from e5ba6a4 to 5a9bc7a Compare December 20, 2016 05:26

kbrose changed the title ~~WIP: Add new downsample method for lines~~ Add new downsample method for lines Dec 20, 2016

NelleV requested changes Dec 31, 2016

View reviewed changes

NelleV self-assigned this Dec 31, 2016

kbrose mentioned this pull request Jun 18, 2017

Updated downsampling #8776

Merged

6 tasks

kbrose closed this Jun 18, 2017

Add new downsample method for lines #7632

Add new downsample method for lines #7632

Conversation

kbrose commented Dec 17, 2016

Overview

Possible Use Case(s)

Algorithm Description

Caveats

Questions

TODO:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tacaswell commented Dec 18, 2016

tacaswell commented Dec 18, 2016

kbrose commented Dec 18, 2016

kbrose commented Dec 18, 2016

tacaswell commented Dec 18, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 19, 2016 • edited Loading

Current coverage is 62.07% (diff: 90.90%)

kbrose commented Dec 20, 2016

NelleV left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

NelleV commented Dec 31, 2016

anntzer commented Jan 8, 2017

efiring commented Jan 8, 2017

anntzer commented Jan 9, 2017

kbrose commented Jan 9, 2017

anntzer commented Jan 9, 2017

kbrose commented Jan 9, 2017

anntzer commented Jan 9, 2017

kbrose commented Jan 9, 2017

anntzer commented Jan 9, 2017

kbrose commented Jan 9, 2017

kbrose commented Jan 9, 2017

anntzer commented Jan 9, 2017

kbrose commented Jan 23, 2017

efiring commented Jan 23, 2017

anntzer commented Jan 23, 2017

kbrose commented Jan 27, 2017

tacaswell commented Jan 27, 2017

kbrose commented Feb 10, 2017

dopplershift commented Feb 10, 2017

kbrose commented Feb 18, 2017

anntzer commented Feb 18, 2017

tacaswell commented Feb 20, 2017

kbrose commented Mar 5, 2017

anntzer commented Mar 5, 2017

kbrose commented Apr 26, 2017

anntzer commented Apr 26, 2017

kbrose commented Apr 26, 2017

kbrose commented Jun 18, 2017

codecov-io commented Dec 19, 2016 •

edited

Loading