Skip to content

plt.step(..., where="auto") #5855

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anntzer opened this issue Jan 15, 2016 · 28 comments
Closed

plt.step(..., where="auto") #5855

anntzer opened this issue Jan 15, 2016 · 28 comments
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues New feature
Milestone

Comments

@anntzer
Copy link
Contributor

anntzer commented Jan 15, 2016

I would like to suggest adding a fourth option for the where kwarg of plt.step: where="auto". This would take x and y arguments which differ by 1 (in either way) in length, and make a step plot that either has horizontal segments at both ends (if len(x) == len(y) + 1) or has vertical segments at both ends (if len(y) == len(x) + 1).

As far as I know, currently, to make such a plot one must manually duplicate either the first or the last value in the shorter of the two arrays, which is a bit awkward. Additionally, "auto"-mode would make it much easier to rotate such a plot by 90° (in the "vertical" direction): you'd just have to switch the x and y arguments, without worrying about whether you need to change where or which element to duplicate.

Thoughts?

@tacaswell
Copy link
Member

I am so-so on this. I assume the use case here is histograms? In that case I think we need a set of histogram (n vals, n+1 edges) high-level plotting tools.

Embrace the unix philosophy of small tools that do one thing well rather than the app philosophy of one tool that does a lot of things so-so.

Put another way, step is not quite the right abstraction/semantics for plotting a histogram, instead of shoe horning the functionality in via a kwargs + huerestics, just write the opinionated function with the correct semantics. It will be clearer for user (assuming we give it a reasonable name) and easier on mpl (as we can be very specific about the inputs and what assumptions we are making about them).

@anntzer
Copy link
Contributor Author

anntzer commented Jan 15, 2016

Actually the use case is time series with measurements corresponding to intervals (e.g.: average velocity between to times), rather than to discrete times. So semantically I'd think it belongs in step.

@ImportanceOfBeingErnest
Copy link
Member

I suppose one major problem here is to define where the vertical lines should start/end. Is it safe to assume without loss of generality that this would always be 0 in data coordinates? Or -inf?

@anntzer
Copy link
Contributor Author

anntzer commented Feb 18, 2019

if your x is [1, 2, 3] and y is [6, 7, 8, 9], then you have
vertical from (1, 6) to (1, 7)
horizontal from (1, 7) to (2, 7)
vertical from (2, 7) to (2, 8)
horizontal from (2, 8) to (3, 8)
vertical from (3, 8) to (3, 9)

conversely if x is longer than y then you start with horizontal, but all ends are well defined.

@ImportanceOfBeingErnest
Copy link
Member

Makes sense, I misread the vertical/horizontal above. Still maybe something like "auto0", with step([1,2,3], [7,8], where="auto0"), which would be interpreted as step([1,2,3], [0,7,8,0], where="auto") would be equally useful.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 18, 2019

step(x, [0, *y, 0], where="auto") is not too bad to write (and then you don't need to cater to whoever asks for auto+inf and auto-inf)

@tacaswell tacaswell modified the milestones: needs sorting, v3.2.0 Feb 18, 2019
@tacaswell tacaswell added the Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues label Feb 18, 2019
@jklymak
Copy link
Member

jklymak commented Feb 18, 2019

Is there some advantage to auto versus just checking the vector sizes?

@tacaswell
Copy link
Member

I am 👍 on this, but it should be opt-in via "auto".

"interlaced" might be a better value? We should think about the name a bite.

@jklymak
Copy link
Member

jklymak commented Feb 19, 2019

where='edges'? where='bin-edges'?

@timhoffm
Copy link
Member

+1 for „edges“.

„bin-edges“ would already be an assumption on the semantics of the data. „Interlaced“ feels a bit more vague.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 21, 2019

It's actually not clear at all to me why one may want to call this "edges"? (I... don't see the semantic association.)

@jklymak
Copy link
Member

jklymak commented Feb 21, 2019

"where do you want the line at y[i] to extend to?" "pre" means before x[i], "post" means after, "mid" means centered on x[i], "edges" mean treat x[i] and x[i+1] as edges of where to draw....

But I'm not strongly against "auto", its just not clear that the way to get "auto" is to provide N+1 x-values..

@anntzer
Copy link
Contributor Author

anntzer commented Feb 21, 2019

But it's not necessarily n+1 x values: the proposal suggests to make this also work both when there's one more x values than y values (then the first and last lines are horizontal), and when there's one more y values than x values (then the first and last lines are vertical).

@timhoffm
Copy link
Member

What would “auto” do if x and y have the same length? “edges” would clearly raise an error.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 21, 2019

It would raise an error; it's not clear what's "clear"(er) in "edges" raising an error...
I personally like either "auto" or "interlaced".

@jklymak
Copy link
Member

jklymak commented Feb 21, 2019

But it's not necessarily n+1 x values: the proposal suggests to make this also work both when there's one more x values than y values (then the first and last lines are horizontal), and when there's one more y values than x values (then the first and last lines are vertical).

Yeah, sure, transpose by whatever vector is longer...

@ImportanceOfBeingErnest
Copy link
Member

Let me also throw in two words: "enclosed" or "contained".

I think prior to naming one should see in how far this new functionality is to be a drawstyle and hence part of the Line2D machinery, or just limited to the step function. The former would be more general and could be applied to more usecases, yet, it causes trouble because lines up to now need equal length data.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 21, 2019

I think(?) it's not worth trying to shoehorn this into drawstyles/Line2D because of the mismatch between x and y sizes. I would just make step output a Line2D with one entry for each visual vertex (unlike "steps-pre"/"steps-mid"/"steps-post" which has one entry for every other visual vertex, and creates the "actual" vertices of the drawn line on-the-fly).

I feel like most of the names proposed here are too semantically linked to the idea of using this to draw histograms; while I am fine with that application (despite the preference of putting that functionality in hist()) this was really not the original motivation of this proposal.

@jklymak
Copy link
Member

jklymak commented Feb 21, 2019

But the original motivation was the same value over an averaging interval. Why wouldn't the interval be specified by "edges"?

@ImportanceOfBeingErnest
Copy link
Member

Well, the "enclosed" or "contained" proposal was more motivated by having the y values enclose the x values, or the x values being contained in/by the y values. Somehow for the other case of more x values than y values, this would not make too much sense, in which case "open(ed)" or so would be the fitting term. Separating the two cases would in principle allow to still use a drawstyle and equal-length data and then ignoring one of the respective x or y coordinates at drawtime.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 21, 2019

Note that the original motivation was also about being able to switch the x and y axises (if your time is on the vertical axis), in which case the meaning of "enclosed" and "opened" would be switched.
The problem with "edges" is that steps-pre and steps-post also specify edges after all; it's only the boundary conditions that change. (I guess they're also "interlaced"...) FWIW the idea behind "auto" was "automatically determine whether x's go between y's or y's go between x's".

@ImportanceOfBeingErnest
Copy link
Member

The problem with auto is really that it is associated with something automatically working - while in this case it would automatically fail for equal length input. I think it is hard to communicate that some auto-setting is not going to work with the default inputs.

The case discussed here is probably more an edgecase, so "edges" would fit!

Concerning (en)closed, one could also interprete this differently, namely that you have no free ends on the line in either or both directions. This would fit for either direction.

@agbuckley
Copy link

I am so-so on this. I assume the use case here is histograms? In that case I think we need a set of histogram (n vals, n+1 edges) high-level plotting tools.

It was this use-case that brought me here: I frequently have lists of n+1 bin edges and n bin values -- e.g. from numpy.histogram or from an incremental histogramming tool -- and want to plot them in a (maybe filled) "stepped histogram" format.

Unless I'm missing something, this isn't currently easy in matplotlib: I have to create an extra duplicate value on one end of the values array, remind myself of the semantics of the "where" argument, explicitly type that somewhat cryptic (from a histogramming mindset) extra arg for all my data lines, etc. This feels a long way from the "make simple things simple" philosophy :-)

Automatically inferring a sensible rendering behaviour for n_y = n_x - 1 would be one way to do it, hence my finding this ticket... but a dedicated set of histogram rendering commands would be even better. Is there any concrete plan or ticket for this? I'd be happy to contribute...

@andrzejnovak
Copy link
Contributor

What is the status of this? Based on #6669 it seems we would at least want a where="step" option, which would be a good start.

If I understand it correctly what we want is essentially the outcome of
ax.step(edges, np.r_[bin_values, bin_values[-1], where='post')

but to be able to get it as
ax.step(edges, bin_values, where='edges')

Is that correct? If so, any guidelines on how this should be implemented other hiding it directly in step and to raise an error if one tries to supply len(x) == len(y) with `where='edges`` ?

@agbuckley
Copy link

That's my understanding of the desired outcome, too: this would be really nice to have, rather than having to hack an obsfucatory duplication of the last counts entry every time.

@andrzejnovak
Copy link
Contributor

@agbuckley Check #15019

@ImportanceOfBeingErnest
Copy link
Member

ImportanceOfBeingErnest commented Aug 16, 2019

After a short look at #15019 I would like to reinterate the idea of making this a drawstyle.

If you let step manipulate the data and let it add or remove values at the beginning or end, the result is that the data you put in is not the data you get out.

line, = plt.step(*data.T)
# -> line.get_data() != data
# also
line2, = plt.step(*line.get_data().T)
# -> line1 != line2

This would be a rather unfortunate design, because there is no way to e.g. update such line.
However, I'd imagine being able to update those lines is pretty useful; imagine e.g. a live plot of detector events, either in histogram mode, or as a moving window average.

In contrast, if the Line2D itself could just take unequal sized arrays as input, and use it depending on its drawstyle the above cyclic behaviour could be guaranteed.
Also, we would not be in the awkward situation where depending on the argument to where you would get the actual data or some manipulated data back. So the user can be sure to get back what they put in.

@anntzer
Copy link
Contributor Author

anntzer commented Feb 28, 2021

Closed by #18275.

@anntzer anntzer closed this as completed Feb 28, 2021
@QuLogic QuLogic modified the milestones: needs sorting, v3.4.0 Mar 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues New feature
Projects
None yet
Development

No branches or pull requests

9 participants