Stem performance boost #9565

dstansby · 2017-10-24T19:43:59Z

Instead of adding each line individually, add them as a line collection. This massively improves performance. Fixes #7969

Changes what is stored in StemContainer from a list of Line2D to a LineCollection (have added API change to reflect this)
There's no way of setting the linemarker this way

WeatherGod · 2017-10-24T21:31:34Z

Looks like some extra files got included in this commit?

WeatherGod · 2017-10-24T21:34:12Z

lib/matplotlib/container.py

+        markerline_stemlines_baseline : tuple
+            Tuple of ``(markerline, stemlines, baseline)``.
+            ``markerline`` contains the `LineCollection` of the markers,
+            ``stemlines`` is a list of the `Line2D` of each line,


Should this doc entry be updated to say LineCollection? Does it matter? Should we coerce to one type or another?

jklymak · 2017-10-24T22:21:31Z

@dstansby you are failing on legend_demo. Can we merge #9324 first to make sure it has fixed whatever the error is? I'd hate for you to chase down errors in the argument handling in code that will change soon (I hope).

jklymak · 2017-10-24T22:23:04Z

BTW, not saying it will be fixed - it looks like legend_demo wants a list of lines....

dstansby · 2017-10-25T17:55:08Z

lib/matplotlib/legend_handler.py

-            l = Line2D([thisx, thisx], [bottom, thisy])
-            leg_stemlines.append(l)
-
-        for lm, m in zip(leg_stemlines, stemlines):


As far as I can tell these two lines aren't needed; not 100% sure though...

dstansby · 2017-10-25T18:01:03Z

I've added a legend to the image test just so stem legends are covered during testing and not just during the doc build.

tacaswell · 2017-10-25T18:42:10Z

doc/api/api_changes/2017-10-24-DS.rst

+-------------------------------------------
+
+`StemContainer` objects now store a `LineCollection` object instead of a list
+of `Line2D` objects for stem lines plotted using `ax.stem`.


Can you expand on the consequences of this a bit? I would have to go look up the code to see how to do a migration.

Adding a note about the speed up would help mollify the angry user who's code we just broke ;)

I've had a go, but not too sure on what to put - if anyone has a better suggestion feel free to push to my branch.

jklymak · 2017-10-30T16:27:03Z

x = np.linspace(0.1, 2 * np.pi, 100)
fig, ax = plt.subplots()
ax.stem(x, np.cos(x), linefmt='C0-', markerfmt='k+', basefmt='C1-.',
            label='Stem')
ax.stem(x, np.sin(x), linefmt='k-', markerfmt='r+', basefmt='C1-.',
            label='Sine')
ax.legend()

Has a slight blemish in the legend in that the Sine line has a bue stem instead of a black one...

dstansby · 2017-11-14T11:14:17Z

lib/matplotlib/legend_handler.py

@@ -619,6 +624,13 @@ def create_artists(self, legend, orig_handle,

        return artists

+    def _copy_collection_props(self, legend_handle, orig_handle):


I managed to fix the legend properties by adding this function, which kind of works.

dstansby · 2017-11-14T11:15:40Z

I've made a slightly hacky fix for the legend.

What I really want to do is find a way of extracting a property cycle from the LineCollection object, and then apply each set of properties to a Line2D object; does anyone know how to do this?

anntzer · 2018-01-04T01:57:11Z

doc/api/api_changes/2017-10-24-DS.rst

+large performance boost to displaying and moving `ax.stem` plots.
+
+Line segments can be extracted from the `LineCollection` using
+`LineCollection.get_segements()`. See the `LineCollection` documentation for


typo segments

anntzer · 2018-01-04T01:58:42Z

Can you clarify what you need, and why you're not satisfied with your current implementation of the legend handler?

dstansby · 2018-01-04T10:26:20Z

Hmm, I'm not sure anymore, maybe this way is fine.

QuLogic · 2018-02-03T00:52:18Z

So the legend is slightly off in that they elements are drawn in the wrong order; can that be fixed?

This also needs a rebase.

jklymak · 2018-07-08T22:09:10Z

#9565 (comment) still stands - the stem and marker are drawn in the wrong zorder in the legend. Otherwise this looks fine to me.

jklymak · 2018-07-13T22:37:59Z

I don't recall the exact conversation, but I think it was considered worth breaking the backwards compatibility.

I guess what would help in the API change note is a quick summary of before and after of how to do something like change the line width... I can't imagine its too painful - i.e. old [l.set_linewidth(2) for l in lines] versus new linecol.set_linewidth(2).

NelleV · 2018-07-13T23:13:43Z

I think we should consider making the transition more smoothly. Maybe an easy way to provide a two cycle warning is to duplicate the code into a new function, provide the fast version under a new name, and do a two cycle deprecation of this one? I don't know where the name stem comes from, and how easy it would be to come up with a new meaningful name. I also don't know whether we could come up with a better way to ensure a two release cycle warning of API change (I didn't give it much thought and just wrote down the first solution I came up with)

ImportanceOfBeingErnest · 2018-07-13T23:28:08Z

... to duplicate the code into a new function, provide the fast version under a new name ...

Just to mention, this is exactly the way you end up with things like pcolor, pcolormesh, pcolorfast and in the end noone will know which one is the one to actually use.

WeatherGod · 2018-07-14T02:09:39Z

eh, not exactly. pcolor and pcolormesh are two different things and one was not intended to replace the other. pcolormesh is faster for a subset of things, so we couldn't get rid of something that works all the time. Now, pcolorfast() was a bad design choice because it was trying to figure out which one to use in what situation. What we have here is something that will always be faster and more efficient and we want to deprecate the old approach. We could call this stem2(), or stemfast() perhaps. And then go a few cycles and get rid of stem(). A few cycles later, we can take back stem().

…

On Fri, Jul 13, 2018 at 7:28 PM, Elan Ernest ***@***.***> wrote: ... to duplicate the code into a new function, provide the fast version under a new name ... Just to mention, this is exactly the way you end up with things like pcolor, pcolormesh, pcolorfast and in the end noone will know which one is the one to actually use. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AARy-Ip4UMfe_DH0fd5byAbvtnQoIpOuks5uGS0KgaJpZM4QE_Di> .

NelleV · 2018-07-14T03:08:28Z

The solution I proposed is also the first thing that came to my mind. There may be a better way to ensure backward compatibility for a few release cycles. If LineCollections can be easily (or not) converted on the fly to list of Line2D, then we may be able to exploit that: a solution would then be to use another argument name than stemlines to store the LineCollections (let's call than stemlines2 for the sake of this explanation). When someone attempts to access stemlines, convert the LineCollections to a list of Line2D and raise the deprecation warning. The two attributes stemlines and stemlines2 would be incompatible with one another, but this would give time for people to update their code. In the case where someone would use the stemlines attribute, it would also cause this to be significantly slower, but I don't think this matters.

(To be clear, I am not suggesting to use the name stemlines2)

dstansby · 2018-07-14T05:11:32Z

I don't have any bandwidth to work out how to do the deprecation properly, so anyone feel free to push to my branch (please don't rewrite what's already there though)

tacaswell · 2018-07-15T07:30:22Z

Punting to 3.1 to give time to sort out this discussion / put in a more gracefully change strategy.

dstansby · 2018-07-26T16:51:52Z

Had a little think about this; as I've said in the API note,

Line segments can be extracted from the `LineCollection` using `LineCollection.get_segements()`.

Which is me trying to say that this is how you get the old return type from the new return type, though re-reading it it's not that clear so I will re-word it.

I think @ImportanceOfBeingErnest has the best suggestion, add a kwarg that changes the return type, set as default to the current return type, and warn that the return type will change in 2 release cycles.

NelleV · 2018-07-26T17:01:34Z

The backward compatible kwarg option would only work if by default, stem was not using the optimized version of the code.

I'm not so concerned about providing the user with an option to switch back to the old API, but about giving proper warning to the user on the API breakage.

jklymak · 2018-09-28T05:00:03Z

This PR is still lingering - I don't really know how we give warning that we will deprecate the returned type of a method. Don't we just change it and in the API note give the way to get the old API back?

NelleV · 2018-09-28T15:43:01Z

@jklymak I don't think this solution has any advantages. If you are going to break someone's code by default and they have to change it anyways to have their code run, you might as well force them to use the new API.
The proper way is to deprecate the old method. There are several ways to do this, and it is just a question of taking the time to do it.

When your library is used by hundreds of thousands of users, you need to be careful about these small modifications that seem minor because you are a core developer and following closely the development of Matplotlib. As a reminder, we broke people's code several time by removing unused module imports in our submodules: while I think we made the right move in removing those unused import, it also underlies the importance of backward compatibility and how small changes affect our users.

The minimal amount of backward compatibility we should aim for is to be able to run the tests and build the document from two version ago with the current version without the tests crashing (understand error, not failures).

jklymak · 2018-09-28T15:53:42Z

I've put on the agenda for Monday's call... I'm not advocating for what @dstansby suggested - I'm advocating we just break the API. If there is a good way to do that smoothly, I'm all for that.

anntzer · 2018-10-01T10:06:10Z

I don't think this solution has any advantages. If you are going to break someone's code by default and they have to change it anyways to have their code run, you might as well force them to use the new API.

I don't agree with this point.

If they were just using stem() to plot, well, a stem plot and not do any customization on the artist, then changing the return type doesn't change anything for them. I'd bet this covers >95% of the use cases. (Of course, 5% of hundreds of thousands of users is still a lot.)
If they did, then they'll get a crash (because the returned type is quite different) (unless they're exactly using the artist API that both the old and new return type share, in which case things should hopefully not change, but if the APIs are not consistent that should be seen as an opportunity to rationalize our API). But then they can just replace stem(...) by stem(..., use_old_api=True) as a quickfix (until we kill the old return type), rather than figuring out how to replace the old API by the new one.

In other words, it gives the users a simple way to temporarily use the old API. You can even make this an rcparam if you want...

The proper way is to deprecate the old method. There are several ways to do this, and it is just a question of taking the time to do it.

You proposed #9565 (comment), which is essentially (sorry if I misunderstood the proposal?) saying provide stem_v2 (name up to bikeshedding) with the new semantics and deprecate stem; I guess we'd later rename stem_v2 to stem? Frankly I don't like this approach at all; if you don't ever rename the function to the old name you end up with silly names such as EnumCalendarInfoExEx (and stem_v123 isn't really helpful to the code reader either) and if you do that makes the diligent programmer (the one who actually follows each upgrades and cleans up after each deprecation warning) do two changes (stem->stem_v2->stem) instead of a single one (take care of the API change).

Edit: Apologies, I missed the proposal in #9565 (comment), which may or may not work in this specific case, but more generally there are other proposed API changes (e.g. #5665) where shimming the entire old API just doesn't make sense; we should just accept that at some point an argument's default value switches from one value to another.

WeatherGod · 2018-10-01T13:45:08Z

We could always name the new function `stemplot()`. That won't clutter up the API with silly looking function calls. We have been bitten several times before with attempts to change the return dtype. There are several other libraries that are built on top of us that provide extra introspective plotting features that depend heavily on the return type of our functions.

…

On Mon, Oct 1, 2018 at 6:06 AM Antony Lee ***@***.***> wrote: I don't think this solution has any advantages. If you are going to break someone's code by default and they have to change it anyways to have their code run, you might as well force them to use the new API. I don't agree with this point. - If they were just using stem() to plot, well, a stem plot and not do any customization on the artist, then changing the return type doesn't change anything for them. I'd bet this covers >95% of the use cases. (Of course, 5% of hundreds of thousands of users is still a lot.) - If they did, then they'll get a crash (because the returned type is quite different) (unless they're exactly using the artist API that both the old and new return type share, in which case things should *hopefully* not change, but if the APIs are not consistent that should be seen as an opportunity to rationalize our API). But then they can just replace stem(...) by stem(..., use_old_api=True) as a quickfix (until we kill the old return type), rather than figuring out how to replace the old API by the new one. In other words, it gives the users a simple way to temporarily use the old API. You can even make this an rcparam if you want... The proper way is to deprecate the old method. There are several ways to do this, and it is just a question of taking the time to do it. You proposed #9565 (comment) <#9565 (comment)>, which is essentially (sorry if I misunderstood the proposal?) saying provide stem_v2 (name up to bikeshedding) with the new semantics and deprecate stem; I guess we'd later rename stem_v2 to stem? Frankly I don't like this approach at all; if you don't ever rename the function to the old name you end up with silly names such as EnumCalendarInfoExEx <https://docs.microsoft.com/en-us/windows/desktop/api/winnls/nf-winnls-enumcalendarinfoexex> (and stem_v123 isn't really helpful to the code reader either) and if you do that makes the diligent programmer (the one who actually follows each upgrades and cleans up after each deprecation warning) do two changes (stem->stem_v2->stem) instead of a single one (take care of the API change). — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9565 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AARy-HQFJH3secTiYJsh9Vr3rGFCk2NHks5ugekZgaJpZM4QE_Di> .

anntzer · 2018-10-01T13:54:07Z

Having subtly different stem() and stemplot() functions that do nearly the same thing but not really is something that I would definitely consider a poor API...

ImportanceOfBeingErnest · 2018-10-01T14:01:06Z

We have kind of a similar thing going on in the current versions.

If you try to create an axes at the same subplot position as an existing axes, it will fall back to this existing axes.

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = fig.add_subplot(111)
print(ax1 == ax2)  # prints True

There is a warning associated with this though:

MatplotlibDeprecationWarning:
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. 
In a future version, a new instance will always be created and returned.  Meanwhile, this warning
can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

Which means if you want the (expected and useful) behaviour of creating a new axes at the same position, you need to add a label (or in fact any other unique argument) to the creating function.

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = fig.add_subplot(111, label="useless label to get a new axes instance")
print(ax1 == ax2)  # prints False

So here the old behaviour is kept, but a strategy of using the new behaviour is presented, i.e. by using a kwarg. To apply this concept here (and in contrast to my previous API breaking proposal) one could add return_type argument as follows:

return_type="lines": Old behaviour (lines), no warning.
return_type="collection": New behaviour (collection), no warning.
return_type=None: The default for now. Sets to "lines". Raises a warning, like "In a future version the return type will change and stem will return a LineCollection. To suppress this warning explicitely state the return_type".

NelleV · 2018-10-01T14:23:49Z

@anntzer We can immediately deprecate stem. That's a fairly common practice, and provides backward compatibility.

I'm also fine with the solution proposed by @ImportanceOfBeingErnest .

anntzer · 2018-10-01T14:31:31Z

I'd much rather do what @ImportanceOfBeingErnest proposed just above.

Edit: see also https://gitter.im/matplotlib/matplotlib?at=5bb250d81c100a4f291607c7 re: backcompat policy. Reproduced here:

I assume @NelleV is raising the point wrt #9565. My personal opinion is that while 2 minor versions' (or 1y) worth of deprecation warnings is fine in general, we should also weight the relative costs of actually putting in the relevant deprecation machinery vs the gains that the API changes gets us, rather than seeing this as an absolutely strict rule.
When it's just a matter of deprecating an API, we have the relevant decorators for that so it's easy (and even then, we don't bother emitting a warning when just accessing the API instead of calling it, so someone who e.g. just passes a function as a callback will not see the deprecation warning).
When it's changing a return/attribute type, that's much trickier. Yes, sometimes it's possible to keep both the old and new return/attribute types available by adding an additional kwarg to the function to switch between the two of them, but for example in #11530 the type of LocationEvent.y was changed to int to be consistent with LocationEvent.x (previously, it was float); clearly that change is an improvement especially now that numpy insists on using ints as indices. Well, that change actually broke the mplcursors test suite (anntzer/mplcursors@e8b2807) which was previously relying on being able to generate LocationEvents at arbitrary positions on the canvas. Of course I could have insisted that there be a kwarg to switch whether LocationEvent.y is an int or a float (after all it is an backcompat break), but I don't think that would actually have been a reasonable request (especially from a new contributor such as the author of #11530, but even from an experienced contributor).
And there are cases such as #11944 where the underlying library (wxPython) changes its own API, and we need to update the code we use to talk to it. In that specific case, the PR gets rid of the gui_repaint() method on wx Canvases, and that'll break mplcairo's wx backend module. Again, I could have requested that the (highly experienced) PR author shims the relevant parts of the wx rendering call stack to not break mplcairo.wx, but is that really how we want to spend our developer time? I don't think so.
TLDR: Let's also think about how hard it is to actually put in the deprecation warnings, and weight that about our best guess of how many people will be affected by the API change, as well as whether they'll see a hard exception (well, at least they know something went wrong) or an incorrect result (though note that #11530 was actually in that category -- silently changing some results, due to the additional truncation to int).

timhoffm · 2018-10-01T20:59:54Z

I'm also -1 on creating a new function with a modified name and deprecating stem() just because we want a different return value.

Also, changing the return value without a prior warning is a massive break.

The return_type variant seems the most reasonable, even though that means to have both code paths around for some time.

QuLogic · 2018-10-02T00:15:51Z

Sorry to bikeshed a bit, but if it's going to change only the return value, then it can be called return_type. But if it is also changing what gets added to the plot, then it should be named something else like plotting_method (or something briefer but as clear). I am not clear on whether the suggested way forward will be the former or the latter.

ImportanceOfBeingErnest · 2018-10-02T00:55:48Z

Well, of course it changes what's in the plot. Two important consquences of this:

the name of a possible kwarg should be different, and also of course one can think about not using strings but booleans for better accessibility/memorizability like draw_lines=False or so. I think all this can be discussed once a general solution is agreed upon.
More importantly, just deprecating the return type will not work; the API change in that sense is that the function draws something completely different (but visually identical) to the axes. This makes @NelleV's suggestion quite impractical, because users could equally try to access the stem elements via ax.get_lines().

Also replicating a recent comment in chat here for completeness:
I did try to search github for ".stem(" to find out how many people might be affected by this. But the search ignores the . and the ( so you end up with massive amounts of results from people visualizing their stem cell research or their scanning transmission electron microscope images. Is there a way to search for literal dots . with GitHub?

dopplershift · 2018-10-02T22:00:02Z

We can't just make this change with no deprecation period. Full stop. This is not fixing a bug, or making a minor improvement and changes how the end image appears somewhat. This is a significant change to how the library is operating in regards to how the call to stem() operates, and is readily observed through the return type and public API, like get_lines().

The performance implications are worth making the change, certainly. But, IMO, we don't get to just throw up our hands and say "Doing a deprecation period is too hard". It's possible that code that did all the right things (in the standard code path) with 3.0 will no longer work with 3.1--and will end up in a traceback. As a concrete example, this would completely obliterate code that was making an animated stem plot. That's user-hostile behavior, IMO. If somebody made a similar change to barbs and broke MetPy, I'd be seriously pissed. (I may be a tad sensitive as I clean up the breakage from matplotlib 3.0.)

What exactly is wrong with:

def stem(self, *args, **kwargs, use_new_behavior=None):
    if use_new_behavior is None:
        warnings.warn("we're changing this")
        use_new_behavior = False
    if use_new_behavior:
        self._new_stem(*args, **kwargs)
    else:
        self._old_stem(*args, **kwargs)

(naming stuff aside) ?

dopplershift

We need to do something more about the API change rather than just a note in the docs. This isn't changing plot appearance, this has the ability to making existing, working scripts, traceback.

anntzer · 2018-10-03T04:31:22Z

@dopplershift That's literally what was proposed in #9565 (comment), and that was deemed no-good-enough in #9565 (comment), at which point I did "throw up our hands and said "Doing a deprecation period is too hard"." But I admittedly misread both the comment and its implication that #9565 (comment) would be OK (sorry about that).

Not to derail this discussion, but the recent hold debacle (#12274, both re: cartopy and scipy) did make me wonder what's the point of spending so much energy on this, if the process flow is "we put in the deprecation warning, we remove the feature two minor releases later, downstream libs complain, we put it back, and remove it again later"... looks like we should just shunt the first two steps :)

dstansby · 2018-10-03T09:03:08Z

Since the conversation here is getting a bit ridiculous, I've opened a new PR at #12380

dstansby added this to the v2.2 milestone Oct 24, 2017

WeatherGod reviewed Oct 24, 2017

View reviewed changes

dstansby commented Oct 25, 2017

View reviewed changes

dstansby force-pushed the stem-speedup branch from c68e2c8 to b7cdd7f Compare October 25, 2017 18:00

dstansby force-pushed the stem-speedup branch from b7cdd7f to 6002b62 Compare October 25, 2017 18:08

tacaswell reviewed Oct 25, 2017

View reviewed changes

tacaswell approved these changes Oct 29, 2017

View reviewed changes

tacaswell added the API: changes label Oct 29, 2017

dstansby added the status: needs revision label Nov 2, 2017

dstansby force-pushed the stem-speedup branch from 4d7ff73 to 86de3d4 Compare November 14, 2017 11:12

dstansby commented Nov 14, 2017

View reviewed changes

anntzer reviewed Jan 4, 2018

View reviewed changes

dstansby removed the status: needs revision label Jan 4, 2018

anntzer mentioned this pull request Jan 4, 2018

Adapt stem plot #10165

Closed

dstansby added the status: needs rebase label Feb 1, 2018

anntzer added the Performance label Mar 19, 2018

dstansby force-pushed the stem-speedup branch from 86de3d4 to 16d88d8 Compare June 28, 2018 09:29

tacaswell modified the milestones: needs sorting, v3.0 Jun 28, 2018

tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Jun 28, 2018

tacaswell modified the milestones: v3.0, v3.1 Jul 15, 2018

dopplershift requested changes Oct 2, 2018

View reviewed changes

dstansby mentioned this pull request Oct 3, 2018

Stem speedup2 #12380

Merged

dstansby closed this Oct 3, 2018

dstansby deleted the stem-speedup branch October 3, 2018 09:03

		@@ -619,6 +624,13 @@ def create_artists(self, legend, orig_handle,

		return artists

		def _copy_collection_props(self, legend_handle, orig_handle):

Stem performance boost #9565

Stem performance boost #9565

Conversation

dstansby commented Oct 24, 2017

WeatherGod commented Oct 24, 2017

WeatherGod Oct 24, 2017

Choose a reason for hiding this comment

jklymak commented Oct 24, 2017

jklymak commented Oct 24, 2017

dstansby Oct 25, 2017

Choose a reason for hiding this comment

dstansby commented Oct 25, 2017

tacaswell Oct 25, 2017

Choose a reason for hiding this comment

dstansby Oct 26, 2017

Choose a reason for hiding this comment

jklymak commented Oct 30, 2017

dstansby Nov 14, 2017

Choose a reason for hiding this comment

dstansby commented Nov 14, 2017

anntzer Jan 4, 2018

Choose a reason for hiding this comment

anntzer commented Jan 4, 2018

dstansby commented Jan 4, 2018

QuLogic commented Feb 3, 2018

jklymak commented Jul 8, 2018 • edited Loading

jklymak commented Jul 13, 2018

NelleV commented Jul 13, 2018

ImportanceOfBeingErnest commented Jul 13, 2018

WeatherGod commented Jul 14, 2018 via email

NelleV commented Jul 14, 2018 • edited Loading

dstansby commented Jul 14, 2018

tacaswell commented Jul 15, 2018

dstansby commented Jul 26, 2018

NelleV commented Jul 26, 2018

jklymak commented Sep 28, 2018

NelleV commented Sep 28, 2018

jklymak commented Sep 28, 2018

anntzer commented Oct 1, 2018 • edited Loading

WeatherGod commented Oct 1, 2018 via email

anntzer commented Oct 1, 2018

ImportanceOfBeingErnest commented Oct 1, 2018

NelleV commented Oct 1, 2018

anntzer commented Oct 1, 2018 • edited Loading

timhoffm commented Oct 1, 2018

QuLogic commented Oct 2, 2018

ImportanceOfBeingErnest commented Oct 2, 2018

dopplershift commented Oct 2, 2018

dopplershift left a comment

Choose a reason for hiding this comment

anntzer commented Oct 3, 2018

dstansby commented Oct 3, 2018

jklymak commented Jul 8, 2018 •

edited

Loading

NelleV commented Jul 14, 2018 •

edited

Loading

anntzer commented Oct 1, 2018 •

edited

Loading

anntzer commented Oct 1, 2018 •

edited

Loading