Skip to content

step-between as drawstyle [Alternative approach to #15019] #15065

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 36 commits into from

Conversation

andrzejnovak
Copy link
Contributor

@andrzejnovak andrzejnovak commented Aug 16, 2019

PR Summary

Alternative approach to #15019 where a new where='between' kwarg option is implemented as a Line2D drawstyle, based on comments from @ImportanceOfBeingErnest. This in the end requires a bit of hackery as well, because it seems matplotlib was always expects len(x) == len(y) Notably I am not sure how to properly modify the recache() fcn.

If this is preferable, we can pursue this instead.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@andrzejnovak
Copy link
Contributor Author

Alright, so I took a shot at fixing fill_between as well. 'between' is now a linestyle applicable to both step() and fill_between(), 'edges' is a convenience wrapper in step() for side edges. Step "switches" orientation based on if len(x) + 1 == len(y) or len(x) == len(y) + 1 like in the old #5855 because it came for free with the linestyle implementation.

So we have

import matplotlib.pyplot as plt
import numpy as np
x, y = [0, 1, 2, 3, 4, 5], [1, 2, 3, 4, 5]

Step

f, ax = plt.subplots()
ax.step(x, y, where='between');

image

f, ax = plt.subplots()
ax.step(y, x, where='between');

image

f, ax = plt.subplots()
ax.step(x,y, where='edges');

image

f, ax = plt.subplots()
ax.step(y, x, where='edges');

image

f, ax = plt.subplots()
ax.step(y, x, where='edges', bottom=-5);

image

Fill_between

f, ax = plt.subplots()
ax.fill_between(x, y, step='between');

image

f, ax = plt.subplots()
ax.fill_betweenx(x, y, step='between');

image

You can even match both at once:

f, ax = plt.subplots()
ax.fill_between([1, 2, 3], [1, 2], step='between');
ax.fill_betweenx([0, 1, 2, 3], [1, 2, 3], step='between');

image

@andrzejnovak
Copy link
Contributor Author

So the problem as I advertised when I opened this is lies here https://github.com/matplotlib/matplotlib/blob/master/lib/matplotlib/lines.py#L676 which shows up in tests and transforms. It looks like a lot of the code is just based on the assumption len(x) == len(y) and no matter how I cut it at some point the 'missing' data point needs to be added in. I tried to be clever and add a nan that gets replaced only when constructing the drawstyle, but it ends up screweing up the transforms. @ImportanceOfBeingErnest what do you think?

@ImportanceOfBeingErnest
Copy link
Member

Mhh, I guess one could rewrite that code to use x and y seperately throughout. Or is there anything forbidding this?
The worrying thing is that so far .get_data() would return a 2D array, and any rewrite of this, returning two separate arrays, would be an API change. If we only return separate arrays in case of this particular linestyle it would be backwards compatible, but we loose the advantage of a single, consistent, hysteresis-free interface.

@jklymak
Copy link
Member

jklymak commented Aug 17, 2019

Perhaps a new method that returns a subclass of Line2D?

@andrzejnovak
Copy link
Contributor Author

"use x and y seperately" i am not sure if all the transforms would still work for uneven sized x,y. In a sense I would say biting the bullet on returning an extra datapoint would be the least invasive way to go.

@tacaswell tacaswell added this to the v3.3.0 milestone Aug 19, 2019
@andrzejnovak
Copy link
Contributor Author

@ImportanceOfBeingErnest @tacaswell Can you check this implementation and advise on what's failing the docs?

@andrzejnovak
Copy link
Contributor Author

andrzejnovak commented Aug 19, 2019

@ImportanceOfBeingErnest @anntzer @tacaswell Could you review and tell me if anything is missing?

@andrzejnovak andrzejnovak force-pushed the ModLine2D branch 2 times, most recently from 2838448 to 1790bd1 Compare August 23, 2019 16:57
@andrzejnovak
Copy link
Contributor Author

bump

@ImportanceOfBeingErnest
Copy link
Member

I would expect the following to give two identical curves

x = np.array([1,2,3,4])
y = np.array([3,2,4])
plt.plot(x,[0,*y,0], drawstyle="steps-between")
plt.step(x,y, where="edges", color="red", linestyle="--")
plt.show()

but what I get is this:

image

Is there a reason "edges" is ignored for the second one?

@andrzejnovak
Copy link
Contributor Author

Whoops, good catch. Forgot to parse kwargs to the edge lines.

@ImportanceOfBeingErnest
Copy link
Member

I think it's a bit suboptimal that those edges are individual lines. I'm not sure in how far "edges" is really needed, but if it is, would it make sense to give it its own drawstyle? I suppose the problem in that case is the "bottom", right? But I wonder if the edges' y coordinates could be set to -infinity such that they always reach the bottom. I think there must be some similar logic for log scales (as seen from the result of plt.semilogy([1,2,3,4],[0,.1,1,10])).
If there really is no other solution, those lines would at least need to be returned from the step function, together with the main line.

@andrzejnovak andrzejnovak force-pushed the ModLine2D branch 2 times, most recently from 59a3067 to 17edc41 Compare August 26, 2019 23:15
@andrzejnovak
Copy link
Contributor Author

Based on the previous discussions it seemed the 'edges' functionality was requested by a large pool of people. I didn't make it into it's own drawstyle (trying to keep as close as possible to the get_data comment from earlier), because it doesn't fundamentally carry more information, it's just a design choice. Also it's not clear what that option would do for fill_between().

I wonder if the edges' y coordinates could be set to -infinity

That's what I tried to do originally, but I didn't find a way to do a mixed coordinate transform for a point. Such line would need two points with [data_coord, data_coord,], [data_coord, axes_coord,], which did not seem feasible.

The return can be fixed.

@ImportanceOfBeingErnest
Copy link
Member

I think I would try to find out how the vertical line in logscales works. And yes, fill_between might need a different approach, but it is very different anyways.

@andrzejnovak
Copy link
Contributor Author

Fill_between() is using the same drawStyle function and actually it's quite convenient that these two are tied together, as we need this behaviour there as well.

The log scale seems to me to work as expected, cutting off at ~0. I don't think it would affect the functionality much to not have the option in order to prevent kwarg bloat, but imo it's a nice helper. For the histogram use case in the current implementation the edges always go to the plot edge. If it was some non-hist shape like described in #6669, the bottom kwarg allows to extend the shape at least outside of logscale.

Log scale test

f, ax = plt.subplots()
ax.step(x, y, where='edges', bottom=-5);

image

f, ax = plt.subplots()
ax.step(x, y, where='edges', bottom=-5)
ax.semilogy();

image

f, ax = plt.subplots()
ax.step(y, x, where='edges', bottom=-5);

image

f, ax = plt.subplots()
ax.step(y, x, where='edges', bottom=-5);
ax.semilogx();

image

If nothing else, the default edges behaviour corresponds to analogous hist() calls

f, ax = plt.subplots()
h, bins, _ = ax.hist(x[:-1], bins=x, weights=y, histtype='step')
f, ax = plt.subplots()
h, bins, _ = ax.hist(x[:-1], bins=x, weights=y, histtype='step')
ax.semilogy();

The bottom kwarg is different by design as there's no point parsing a *(value) kwarg to step, when the same could be achieved by multiplying directly the input array.

@andrzejnovak
Copy link
Contributor Author

lines would at least need to be returned from the step function, together with the main line

Now it would return [<matplotlib.lines.Line2D at 0x7f3c09480c88>, <matplotlib.lines.Line2D at 0x7f3c09485b00>, <matplotlib.lines.Line2D at 0x7f3c094850f0>] is that what you had in mind?@ImportanceOfBeingErnest

@ImportanceOfBeingErnest
Copy link
Member

ImportanceOfBeingErnest commented Aug 27, 2019

By logscale I meant to see how the line in plt.semilogy([1,2,3,4],[0,.1,1,10]) achieves to be vertical.

It doesn't look like fill_between would make use of "edges" at all, so I think this is really independent.

@andrzejnovak
Copy link
Contributor Author

andrzejnovak commented Aug 27, 2019

I am not sure what you mean, or rather what kind of modification you're after? The current implementation effectively forwards the info to plot in the same way as semilogy does.

@ImportanceOfBeingErnest
Copy link
Member

I am trying to evaluate if it is possible to get rid of the extra lines on the sides by making "edges" a drawstyle as well.

@andrzejnovak
Copy link
Contributor Author

andrzejnovak commented Aug 27, 2019

Yeah, that can be done if we ditch the bottom kwarg functionality, because that would need to get parsed quite deep.

I suppose we can just not allow edges as a key to be parsed in fillbetween.

@andrzejnovak
Copy link
Contributor Author

@ImportanceOfBeingErnest Is this what you had in mind?

@ImportanceOfBeingErnest
Copy link
Member

Maybe I'm missing something, but I don't think that works because it looks like it replaces all nans. But what if user data contains nans?

@andrzejnovak
Copy link
Contributor Author

@ImportanceOfBeingErnest Ok, I didn't think anybody would be doing that, but at least it allows for the fill_between(where=) functionality by passing nans directly in the input.

Any idea what's tripping up sphinx? The error message is not very illuminating.

@jklymak
Copy link
Member

jklymak commented Mar 11, 2020

If you can accept the underlying implementation

The underlying implementation is my major concern: I don't think Line2D should get new draw styles. I don't think it should have had draw styles in the first place, but at least they are somewhat justified because x and y at least have the same length and there are no other arguments that it needs. But I don't think we should compound the error here where x and y do not even have the same length and we now need another argument to Line2D (bottom) just for that case.

@andrzejnovak
Copy link
Contributor Author

Ok, thanks for the clarification. I thought that was the case, but wasn't sure since your last post was mostly dealing with the public api.

@timhoffm
Copy link
Member

@tacaswell I actually think that trying to cram this into Line2D via drawstyle is a major flaw.

(Hist)step lines are conceptually different from plain lines. While you can appoximate stepwise with a drawstyle, the underlying data model of steps is different (unequal number of step positions and plateau values, orientation, baseline). IMHO a new HistLine2D( or StepLine2D) should implement that behavior. I‘d leave the drawstyle out of this, because it does not really fit our needs here.

I agree with @jklymak here:

The simplest solution is two new draw_hist() and draw_hist_fill() methods with mild Line2D and Patch subclasses respectively to allow animation. I don't feel this is major surgery compared to whats being proposed here at all. To the contrary, it would be the least intrusive solution.

I did not think this through in detail, but I anticipate that a proper class relation to Line2D is feasible (either subclass or pull out a common base class).

@tacaswell
Copy link
Member

We already support mixed length inputs to plot

ax.plot([0, 1, 2], 5)

broadcasts up "as you expect".

I think it would be a mistake if we said "ah, these are different lengths there for we will pick a different draw style", but if the user gives us the miss-matched length and asks for the draw style it is not ambiguous.

It would be better to special case the handler generation to change the draw style extracted from the Line2D object than to have the step generation code fall back (that keeps the special casing well contained and avoids issues with the fallback masking user errors else where).

We either can infer the "direction" from the length miss-match and need an extra parameter of the "zero-point" or we could have a parameter for the "direction" and, if the dependent data is longer, use the first and last values as the "zero point(s)". On a bit more consideration, the second option is worse in several ways: there is still a hard-coded default (0), having different zero points seems cute but is probably not something that is worth the complexity to get, and the API no longer directly matches the returns of np.hist.

I'm not sure if we were starting from scratch we would put drawstyle on Line2D, but that boat has left the harbor. We are now left with the choice of doing this variation on step the same way or a different way. I think it is better to keep things consistent (put everything in Line2D), but to be very cranky about exceptions if things are out of whack.

I argue that this is adding "flat" to plot. "nearest" is steps-mid, and 'steps-pre' and 'steps-post' are the (now deprecated) behavior of "flat" which does not really show the last (first) data point and "gourand" is the normal plot draw style. I originally meant the pcolormesh as a bit of a throw away comment or a "rhymes with" reference, but the more I think about it the more analagous it seems. If we are going to lump the "center-centric" and "edge-centric" plotting together in pcolormesh we should do it here too. At a minimum, step should allow both (but that would require a bit of jiggery-pokery around Axes.plot

On a bit more consideration of what could be done in a sub-class, we could have a none of this state in Line2D, however that would probably require making most of plot private, adding the ability to pass a "line_class" into it and then make plot also a thin wrapper that inject Line2D into Axes._plot.


So I think there are a couple of mostly lightly coupled decisions to make. I'm focusing on the Line2D case, but expect it will propagate to the fill_between case. On the public API side:

  • do we add a miss-match only method? (I say no due to pcolormesh ryhme and API simplicity)
  • does step allow miss match lengths or not? (I say yes due to being forced by the choice above)
  • does plot allow the edce draw styles to be passed? (I say yes, but barely, to rhyme with steps- and for simplifying the implementation)

On the Artist side:

  • does Line2D get new draw style or do we do it in a sub-class? (I say in Line2D, again barely, to rhyme with steps and simplifying the implementation. A subclass makes more sense if you said yes to the first choice in the public API).

A more drastic step if we want to go with (Yes, No, No, SubClass) is that we should then push steps- to rhyme with the new scheme, but that feels like API thrashing for the sake of purity rather than actual user benefit. At the end of the day, I think it is this argument (steps:nearest :: edges: flat, that steps / edges and nearest / flat should both internally rhyme and the cost of doing the deprecations to pcolormesh and step is too high) that pushed me to my current position.


I have been seeing a bunch of complaints about Matplotlib being inconsistent recently so maybe that is nagging at the back of my mind, but I think it is better to remain consistent with a slightly dodgy design choice [1] than for parts of the library to have inconsistent design choices [2].

[1] the fallout and true trade offs are always clearer in hindsight :(
[2] We also have some of that issue too, but I digress

@timhoffm
Copy link
Member

timhoffm commented Mar 11, 2020

I see your arguments, but still come to the different conclusion that dedicated methods and classes and no interference with step draw styles would be cleaner.

Let the slightly odd step draw styles and functions just live on as they are. Once there‘s a clean histogram line-plotting functionality, there‘s one less reason to use them.

I don‘t have the time to follow this discussion in detail. But for what it‘s worth, I‘d like to have noted that I support @jklymak ‘s point-of-view.

@tacaswell
Copy link
Member

@timhoffm Despite your comment going up about an hour before mine, I had not read it yet while writing my (too long) comment, sorry about that.

@jklymak
Copy link
Member

jklymak commented Mar 12, 2020

Just to be concrete about my proposal. In lines.py:

class HistLine(Line2D):
    """
    Subclass of `.Line2D` that implements stepped drawing for histograms.
    """
    def __init__(self, bins, vals, *,
                 orientation='horizontal', baseline=0, **kwargs):
        self.baseline = baseline
        self.orientation = orientation
        self._bins = bins
        self._vals = vals
        x, y = self._update_data()
        print(x, y)
        super(HistLine, self).__init__(x, y , **kwargs)

    def _update_data(self):
        if self._bins.size - 1 != self._vals.size:
            raise ValueError('the length of the bins is wrong')
        x = np.vstack((self._bins, self._bins)).T.flatten()
        y = np.vstack((self._vals, self._vals)).T.flatten()
        y = np.hstack((self.baseline, y, self.baseline))
        if self.orientation == 'horizontal':
            return x, y
        else:
            return y, x

    def set_bins(self, bins):
        self._bins = bins
        self.set_data(self._update_data())

    def set_vals(self, vals):
        self._vals = vals
        self.set_data(self._update_data())

    def set_bins_vals(self, bins, vals):
        self._vals = vals
        self._bins = bins
        self.set_data(self._update_data())

and in _axes.py

    def hist_draw(self, bins, vals, *,
                  orientation='horizontal', baseline=0, **kwargs):
        # convert stuff in here...
        # ...
        line = mlines.HistLine(bins, vals, orientation=orientation, baseline=baseline, **kwargs)
        self.add_artist(line)
        return line

This works, but is obviously not finished, with proper error checking and unit conversion, but just to give an idea of the structure. Obviously if we also did hist_draw_fill there would need to be some shared helpers to make the arrays.

@efiring
Copy link
Member

efiring commented Mar 16, 2020

I disagree with the "boat has sailed" argument in this case. Doubling down on a bad choice doesn't make it better, it makes it worse. Long-term, we can, and do, use deprecations to move towards better design.

@andrzejnovak
Copy link
Contributor Author

@jklymak Nice, few questions on the rest of the implementation though.

  • Legend handle seems to be missing. I couldn't find an example where it's added manually though
  • I can easily extend this for edges and allowed Nan values in vals, it would be a bit more code to allow passing multiple histograms at the same time, is this desirable in the way other functions are able to do?
  • Nan values would result in splitting of the lines, so the defined set_vals would likely not work properly
  • The big thing this does not solve is the fill_between. And I am not sure how it could, without a good amount of duplication.

@jklymak
Copy link
Member

jklymak commented Mar 17, 2020

We did discuss this yesterday on the call, for quite a while. @tacaswell was still mulling the approaches.

Just to be clear, the above was a working prototype for thought. I didn't look into detail at all the things that would need to be customized.

Presumably the duplication between line- and filled-histograms would be factored out in a helper that makes the x/y. I guess some of the get/set_ methods would be pretty similar, but I don't really consider that a problem. As discussed above, we could pack both types of histograms into the same class, but we do't do that with lines and "filled" plots, so I don't see why we would do it here.

I think the balance here is between code and API complexity and some cut and paste of straight-forward helper methods on some sub classes.

@tacaswell
Copy link
Member

I'm pretty sold on needing new API and sub-classes for this. I think the implementation is still up in the air (using drawstyle in a way that does not leak into Line2D directly vs doing the transformation as set time and buffering theget_ methods to hide that fact from the users).

I previously said that I would make executive decisions today, but I think the discussion is still moving along OK so going to not do that until y'all ask me to ;)

@tacaswell
Copy link
Member

I am going to move this to 3.4 as it still needs some work and discussion. Sorry this has gotten bogged down.

@andrzejnovak
Copy link
Contributor Author

@tacaswell Been a bit busy lately, when do you close 3.3?

@tacaswell
Copy link
Member

The original deadline for closing 3.3 was the beginning of May (which we obviously missed), but as soon as possible.

@jklymak
Copy link
Member

jklymak commented Aug 14, 2020

I've moved this to draft. I actually think it should be closed, not because its not nice work, but because I think the underlying premise of making this a draw style is flawed. Thanks for your eagerness to work on this @andrzejnovak and sorry that it got stalled out. I think the underlying issue does require a solution.

@jklymak jklymak added the stale label Aug 14, 2020
@andrzejnovak
Copy link
Contributor Author

¯_(ツ)_/¯ if merging a drawstyle based implementation is a no go, then might as well, since the alternative is likely to need entirely different code.

Maybe it's my limited FOSS experience, but I still think was a strange call. The PR solved a couple of long-standing issues, the magnitude of at least one of them being illustrated by how popular https://github.com/scikit-hep/mplhep got in the hep community, not doing that much more than passing a np_r[y, y[-1]] to plt.step. The solution was a logical extension of existing functions and likewise was implemented by extending existing logic. To my knowledge, at least at the time, that logic was not being reworked indicating that it would be faulty. IIRC there weren't any outstanding conflict or mishandlings for nans, zooming, legend, and the like (though the branch is by now obviously out of date). The x,y asymmetry was also not quite new given the edge handling in pcolormesh.

I would say that in the absence of a clear spec for a new API the choices made were quite logical at least in that they were fixing current pain points (Someone trying to plot a prebinned step hist is most likely hacking plt.step and for a filled hist plt.fill_between.

Given all that I would expect this to pass the "question isn't if it's the perfect solution, but if it's a solution you can live with" test. Moreover so, if the API is acceptable, the underlying implementation could be changed at any later point.

I would still like to see this functionality, but since I don't see a clear way to do this based on the draft above and the fact that there still isn't clear agreed-upon spec, I am not likely to come back to this any time soon.

@timhoffm
Copy link
Member

timhoffm commented Aug 15, 2020

@andrzejnovak Sorry this got discussed to death. I believe it was a fruitful discussion nevertheless because we have realized what we want and need to make histogram-like line plots.

Moreover so, if the API is acceptable.

IMHO the API is faulty: This is trying to cram something into plot() for which it was not designed. np_r[y, y[-1]] is something that can be decided on and done on the user side, but we as an use case agnostic library shouldn't try to guess that in case the user passes one more x than y.

I would still like to see this functionality, but since I don't see a clear way to do this based on the draft above and the fact that there still isn't clear agreed-upon spec, I am not likely to come back to this any time soon.

I agree that a completely new approach is necessary. But the way seems pretty clear to me:

  • Add a function stepline(edges, values, *, baseline=None, orientation='vertical', filled=false).
  • If we want to keep it simple, that would create and return a Polygon. Disadvantage: manipulating values and baseline later is not trivial as they are not explicitly in the object. If we want to do it right, create a new StepLine artist.*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants