Skip to content

Stem performance boost #9565

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Conversation

dstansby
Copy link
Member

Instead of adding each line individually, add them as a line collection. This massively improves performance. Fixes #7969

  • Changes what is stored in StemContainer from a list of Line2D to a LineCollection (have added API change to reflect this)
  • There's no way of setting the linemarker this way

@dstansby dstansby added this to the v2.2 milestone Oct 24, 2017
@WeatherGod
Copy link
Member

Looks like some extra files got included in this commit?

markerline_stemlines_baseline : tuple
Tuple of ``(markerline, stemlines, baseline)``.
``markerline`` contains the `LineCollection` of the markers,
``stemlines`` is a list of the `Line2D` of each line,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this doc entry be updated to say LineCollection? Does it matter? Should we coerce to one type or another?

@jklymak
Copy link
Member

jklymak commented Oct 24, 2017

@dstansby you are failing on legend_demo. Can we merge #9324 first to make sure it has fixed whatever the error is? I'd hate for you to chase down errors in the argument handling in code that will change soon (I hope).

@jklymak
Copy link
Member

jklymak commented Oct 24, 2017

BTW, not saying it will be fixed - it looks like legend_demo wants a list of lines....

l = Line2D([thisx, thisx], [bottom, thisy])
leg_stemlines.append(l)

for lm, m in zip(leg_stemlines, stemlines):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell these two lines aren't needed; not 100% sure though...

@dstansby
Copy link
Member Author

I've added a legend to the image test just so stem legends are covered during testing and not just during the doc build.

-------------------------------------------

`StemContainer` objects now store a `LineCollection` object instead of a list
of `Line2D` objects for stem lines plotted using `ax.stem`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on the consequences of this a bit? I would have to go look up the code to see how to do a migration.

Adding a note about the speed up would help mollify the angry user who's code we just broke ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had a go, but not too sure on what to put - if anyone has a better suggestion feel free to push to my branch.

@jklymak
Copy link
Member

jklymak commented Oct 30, 2017

x = np.linspace(0.1, 2 * np.pi, 100)
fig, ax = plt.subplots()
ax.stem(x, np.cos(x), linefmt='C0-', markerfmt='k+', basefmt='C1-.',
            label='Stem')
ax.stem(x, np.sin(x), linefmt='k-', markerfmt='r+', basefmt='C1-.',
            label='Sine')
ax.legend()

Has a slight blemish in the legend in that the Sine line has a bue stem instead of a black one...

figure_1

@@ -619,6 +624,13 @@ def create_artists(self, legend, orig_handle,

return artists

def _copy_collection_props(self, legend_handle, orig_handle):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to fix the legend properties by adding this function, which kind of works.

@dstansby
Copy link
Member Author

I've made a slightly hacky fix for the legend.

What I really want to do is find a way of extracting a property cycle from the LineCollection object, and then apply each set of properties to a Line2D object; does anyone know how to do this?

large performance boost to displaying and moving `ax.stem` plots.

Line segments can be extracted from the `LineCollection` using
`LineCollection.get_segements()`. See the `LineCollection` documentation for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo segments

@anntzer
Copy link
Contributor

anntzer commented Jan 4, 2018

Can you clarify what you need, and why you're not satisfied with your current implementation of the legend handler?

@dstansby
Copy link
Member Author

dstansby commented Jan 4, 2018

Hmm, I'm not sure anymore, maybe this way is fine.

@QuLogic
Copy link
Member

QuLogic commented Feb 3, 2018

So the legend is slightly off in that they elements are drawn in the wrong order; can that be fixed?

This also needs a rebase.

@tacaswell tacaswell modified the milestones: needs sorting, v3.0 Jun 28, 2018
@tacaswell tacaswell added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label Jun 28, 2018
@jklymak
Copy link
Member

jklymak commented Jul 8, 2018

figure_1

#9565 (comment) still stands - the stem and marker are drawn in the wrong zorder in the legend. Otherwise this looks fine to me.

@jklymak
Copy link
Member

jklymak commented Jul 13, 2018

I don't recall the exact conversation, but I think it was considered worth breaking the backwards compatibility.

I guess what would help in the API change note is a quick summary of before and after of how to do something like change the line width... I can't imagine its too painful - i.e. old [l.set_linewidth(2) for l in lines] versus new linecol.set_linewidth(2).

@NelleV
Copy link
Member

NelleV commented Jul 13, 2018

I think we should consider making the transition more smoothly. Maybe an easy way to provide a two cycle warning is to duplicate the code into a new function, provide the fast version under a new name, and do a two cycle deprecation of this one? I don't know where the name stem comes from, and how easy it would be to come up with a new meaningful name. I also don't know whether we could come up with a better way to ensure a two release cycle warning of API change (I didn't give it much thought and just wrote down the first solution I came up with)

@ImportanceOfBeingErnest
Copy link
Member

... to duplicate the code into a new function, provide the fast version under a new name ...

Just to mention, this is exactly the way you end up with things like pcolor, pcolormesh, pcolorfast and in the end noone will know which one is the one to actually use.

@WeatherGod
Copy link
Member

WeatherGod commented Jul 14, 2018 via email

@NelleV
Copy link
Member

NelleV commented Jul 14, 2018

The solution I proposed is also the first thing that came to my mind. There may be a better way to ensure backward compatibility for a few release cycles. If LineCollections can be easily (or not) converted on the fly to list of Line2D, then we may be able to exploit that: a solution would then be to use another argument name than stemlines to store the LineCollections (let's call than stemlines2 for the sake of this explanation). When someone attempts to access stemlines, convert the LineCollections to a list of Line2D and raise the deprecation warning. The two attributes stemlines and stemlines2 would be incompatible with one another, but this would give time for people to update their code. In the case where someone would use the stemlines attribute, it would also cause this to be significantly slower, but I don't think this matters.

(To be clear, I am not suggesting to use the name stemlines2)

@dstansby
Copy link
Member Author

I don't have any bandwidth to work out how to do the deprecation properly, so anyone feel free to push to my branch (please don't rewrite what's already there though)

@tacaswell tacaswell modified the milestones: v3.0, v3.1 Jul 15, 2018
@tacaswell
Copy link
Member

Punting to 3.1 to give time to sort out this discussion / put in a more gracefully change strategy.

@dstansby
Copy link
Member Author

Had a little think about this; as I've said in the API note,

Line segments can be extracted from the `LineCollection` using `LineCollection.get_segements()`.

Which is me trying to say that this is how you get the old return type from the new return type, though re-reading it it's not that clear so I will re-word it.

I think @ImportanceOfBeingErnest has the best suggestion, add a kwarg that changes the return type, set as default to the current return type, and warn that the return type will change in 2 release cycles.

@NelleV
Copy link
Member

NelleV commented Jul 26, 2018

The backward compatible kwarg option would only work if by default, stem was not using the optimized version of the code.

I'm not so concerned about providing the user with an option to switch back to the old API, but about giving proper warning to the user on the API breakage.

@jklymak
Copy link
Member

jklymak commented Sep 28, 2018

This PR is still lingering - I don't really know how we give warning that we will deprecate the returned type of a method. Don't we just change it and in the API note give the way to get the old API back?

@NelleV
Copy link
Member

NelleV commented Sep 28, 2018

@jklymak I don't think this solution has any advantages. If you are going to break someone's code by default and they have to change it anyways to have their code run, you might as well force them to use the new API.
The proper way is to deprecate the old method. There are several ways to do this, and it is just a question of taking the time to do it.

When your library is used by hundreds of thousands of users, you need to be careful about these small modifications that seem minor because you are a core developer and following closely the development of Matplotlib. As a reminder, we broke people's code several time by removing unused module imports in our submodules: while I think we made the right move in removing those unused import, it also underlies the importance of backward compatibility and how small changes affect our users.

The minimal amount of backward compatibility we should aim for is to be able to run the tests and build the document from two version ago with the current version without the tests crashing (understand error, not failures).

@jklymak
Copy link
Member

jklymak commented Sep 28, 2018

I've put on the agenda for Monday's call... I'm not advocating for what @dstansby suggested - I'm advocating we just break the API. If there is a good way to do that smoothly, I'm all for that.

@anntzer
Copy link
Contributor

anntzer commented Oct 1, 2018

I don't think this solution has any advantages. If you are going to break someone's code by default and they have to change it anyways to have their code run, you might as well force them to use the new API.

I don't agree with this point.

  • If they were just using stem() to plot, well, a stem plot and not do any customization on the artist, then changing the return type doesn't change anything for them. I'd bet this covers >95% of the use cases. (Of course, 5% of hundreds of thousands of users is still a lot.)
  • If they did, then they'll get a crash (because the returned type is quite different) (unless they're exactly using the artist API that both the old and new return type share, in which case things should hopefully not change, but if the APIs are not consistent that should be seen as an opportunity to rationalize our API). But then they can just replace stem(...) by stem(..., use_old_api=True) as a quickfix (until we kill the old return type), rather than figuring out how to replace the old API by the new one.

In other words, it gives the users a simple way to temporarily use the old API. You can even make this an rcparam if you want...

The proper way is to deprecate the old method. There are several ways to do this, and it is just a question of taking the time to do it.

You proposed #9565 (comment), which is essentially (sorry if I misunderstood the proposal?) saying provide stem_v2 (name up to bikeshedding) with the new semantics and deprecate stem; I guess we'd later rename stem_v2 to stem? Frankly I don't like this approach at all; if you don't ever rename the function to the old name you end up with silly names such as EnumCalendarInfoExEx (and stem_v123 isn't really helpful to the code reader either) and if you do that makes the diligent programmer (the one who actually follows each upgrades and cleans up after each deprecation warning) do two changes (stem->stem_v2->stem) instead of a single one (take care of the API change).


Edit: Apologies, I missed the proposal in #9565 (comment), which may or may not work in this specific case, but more generally there are other proposed API changes (e.g. #5665) where shimming the entire old API just doesn't make sense; we should just accept that at some point an argument's default value switches from one value to another.

@WeatherGod
Copy link
Member

WeatherGod commented Oct 1, 2018 via email

@anntzer
Copy link
Contributor

anntzer commented Oct 1, 2018

Having subtly different stem() and stemplot() functions that do nearly the same thing but not really is something that I would definitely consider a poor API...

@ImportanceOfBeingErnest
Copy link
Member

We have kind of a similar thing going on in the current versions.

If you try to create an axes at the same subplot position as an existing axes, it will fall back to this existing axes.

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = fig.add_subplot(111)
print(ax1 == ax2)  # prints True

There is a warning associated with this though:

MatplotlibDeprecationWarning:
Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. 
In a future version, a new instance will always be created and returned.  Meanwhile, this warning
can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

Which means if you want the (expected and useful) behaviour of creating a new axes at the same position, you need to add a label (or in fact any other unique argument) to the creating function.

fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = fig.add_subplot(111, label="useless label to get a new axes instance")
print(ax1 == ax2)  # prints False

So here the old behaviour is kept, but a strategy of using the new behaviour is presented, i.e. by using a kwarg. To apply this concept here (and in contrast to my previous API breaking proposal) one could add return_type argument as follows:

  • return_type="lines": Old behaviour (lines), no warning.
  • return_type="collection": New behaviour (collection), no warning.
  • return_type=None: The default for now. Sets to "lines". Raises a warning, like "In a future version the return type will change and stem will return a LineCollection. To suppress this warning explicitely state the return_type".

@NelleV
Copy link
Member

NelleV commented Oct 1, 2018

@anntzer We can immediately deprecate stem. That's a fairly common practice, and provides backward compatibility.

I'm also fine with the solution proposed by @ImportanceOfBeingErnest .

@anntzer
Copy link
Contributor

anntzer commented Oct 1, 2018

I'd much rather do what @ImportanceOfBeingErnest proposed just above.

Edit: see also https://gitter.im/matplotlib/matplotlib?at=5bb250d81c100a4f291607c7 re: backcompat policy. Reproduced here:

I assume @NelleV is raising the point wrt #9565. My personal opinion is that while 2 minor versions' (or 1y) worth of deprecation warnings is fine in general, we should also weight the relative costs of actually putting in the relevant deprecation machinery vs the gains that the API changes gets us, rather than seeing this as an absolutely strict rule.
When it's just a matter of deprecating an API, we have the relevant decorators for that so it's easy (and even then, we don't bother emitting a warning when just accessing the API instead of calling it, so someone who e.g. just passes a function as a callback will not see the deprecation warning).
When it's changing a return/attribute type, that's much trickier. Yes, sometimes it's possible to keep both the old and new return/attribute types available by adding an additional kwarg to the function to switch between the two of them, but for example in #11530 the type of LocationEvent.y was changed to int to be consistent with LocationEvent.x (previously, it was float); clearly that change is an improvement especially now that numpy insists on using ints as indices. Well, that change actually broke the mplcursors test suite (anntzer/mplcursors@e8b2807) which was previously relying on being able to generate LocationEvents at arbitrary positions on the canvas. Of course I could have insisted that there be a kwarg to switch whether LocationEvent.y is an int or a float (after all it is an backcompat break), but I don't think that would actually have been a reasonable request (especially from a new contributor such as the author of #11530, but even from an experienced contributor).
And there are cases such as #11944 where the underlying library (wxPython) changes its own API, and we need to update the code we use to talk to it. In that specific case, the PR gets rid of the gui_repaint() method on wx Canvases, and that'll break mplcairo's wx backend module. Again, I could have requested that the (highly experienced) PR author shims the relevant parts of the wx rendering call stack to not break mplcairo.wx, but is that really how we want to spend our developer time? I don't think so.
TLDR: Let's also think about how hard it is to actually put in the deprecation warnings, and weight that about our best guess of how many people will be affected by the API change, as well as whether they'll see a hard exception (well, at least they know something went wrong) or an incorrect result (though note that #11530 was actually in that category -- silently changing some results, due to the additional truncation to int).

@timhoffm
Copy link
Member

timhoffm commented Oct 1, 2018

I'm also -1 on creating a new function with a modified name and deprecating stem() just because we want a different return value.

Also, changing the return value without a prior warning is a massive break.

The return_type variant seems the most reasonable, even though that means to have both code paths around for some time.

@QuLogic
Copy link
Member

QuLogic commented Oct 2, 2018

Sorry to bikeshed a bit, but if it's going to change only the return value, then it can be called return_type. But if it is also changing what gets added to the plot, then it should be named something else like plotting_method (or something briefer but as clear). I am not clear on whether the suggested way forward will be the former or the latter.

@ImportanceOfBeingErnest
Copy link
Member

Well, of course it changes what's in the plot. Two important consquences of this:

  • the name of a possible kwarg should be different, and also of course one can think about not using strings but booleans for better accessibility/memorizability like draw_lines=False or so. I think all this can be discussed once a general solution is agreed upon.
  • More importantly, just deprecating the return type will not work; the API change in that sense is that the function draws something completely different (but visually identical) to the axes. This makes @NelleV's suggestion quite impractical, because users could equally try to access the stem elements via ax.get_lines().

Also replicating a recent comment in chat here for completeness:
I did try to search github for ".stem(" to find out how many people might be affected by this. But the search ignores the . and the ( so you end up with massive amounts of results from people visualizing their stem cell research or their scanning transmission electron microscope images. Is there a way to search for literal dots . with GitHub?

@dopplershift
Copy link
Contributor

We can't just make this change with no deprecation period. Full stop. This is not fixing a bug, or making a minor improvement and changes how the end image appears somewhat. This is a significant change to how the library is operating in regards to how the call to stem() operates, and is readily observed through the return type and public API, like get_lines().

The performance implications are worth making the change, certainly. But, IMO, we don't get to just throw up our hands and say "Doing a deprecation period is too hard". It's possible that code that did all the right things (in the standard code path) with 3.0 will no longer work with 3.1--and will end up in a traceback. As a concrete example, this would completely obliterate code that was making an animated stem plot. That's user-hostile behavior, IMO. If somebody made a similar change to barbs and broke MetPy, I'd be seriously pissed. (I may be a tad sensitive as I clean up the breakage from matplotlib 3.0.)

What exactly is wrong with:

def stem(self, *args, **kwargs, use_new_behavior=None):
    if use_new_behavior is None:
        warnings.warn("we're changing this")
        use_new_behavior = False
    if use_new_behavior:
        self._new_stem(*args, **kwargs)
    else:
        self._old_stem(*args, **kwargs)

(naming stuff aside) ?

Copy link
Contributor

@dopplershift dopplershift left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to do something more about the API change rather than just a note in the docs. This isn't changing plot appearance, this has the ability to making existing, working scripts, traceback.

@anntzer
Copy link
Contributor

anntzer commented Oct 3, 2018

@dopplershift That's literally what was proposed in #9565 (comment), and that was deemed no-good-enough in #9565 (comment), at which point I did "throw up our hands and said "Doing a deprecation period is too hard"." But I admittedly misread both the comment and its implication that #9565 (comment) would be OK (sorry about that).


Not to derail this discussion, but the recent hold debacle (#12274, both re: cartopy and scipy) did make me wonder what's the point of spending so much energy on this, if the process flow is "we put in the deprecation warning, we remove the feature two minor releases later, downstream libs complain, we put it back, and remove it again later"... looks like we should just shunt the first two steps :)

@dstansby dstansby mentioned this pull request Oct 3, 2018
@dstansby
Copy link
Member Author

dstansby commented Oct 3, 2018

Since the conversation here is getting a bit ridiculous, I've opened a new PR at #12380

@dstansby dstansby closed this Oct 3, 2018
@dstansby dstansby deleted the stem-speedup branch October 3, 2018 09:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API: changes Performance Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants