Skip to content

Add step option where='edges' to facilitate pre-binned hist plots #15019

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

andrzejnovak
Copy link
Contributor

PR Summary

First attempt to address one of the requests raised in #6669 simplifying the api of step for pre-binned histogram plotting like

data = np.random.normal(4,1, 200)
y, x = np.histogram(data, bins=np.linspace(0,8,9))

from

fig, ax = plt.subplots()
ax.step(x,np.r_[y, y[-1]], where='post');

to

fig, ax = plt.subplots()
ax.step(x, y, where='edges');

The consensus seems to have landed on no lines at the end, but this could easily be switched.
image
In this specific case I am also not sure whether 'edges' is the best keyword, maybe 'bins' would be better if we don't have the end lines go to -inf.

I'll be happy about feedback on the actual implementation.

Also it would be good to know if it's better to address the various enhancements in #6669 one by one or to combine things into less PRs.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

Copy link
Member

@dstansby dstansby left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like a nice new feature! I've left a comment, and additionally

  • I think having edges would be better
  • This could do with a test, you could see if any existing tests in lib/matplotlib/tests/test_axes.py can be adapted to exercise the new option.

@@ -2089,6 +2089,8 @@ def step(self, x, y, *args, where='pre', data=None, **kwargs):
every *x* position, i.e. the interval ``[x[i], x[i+1])`` has the
value ``y[i]``.
- 'mid': Steps occur half-way between the *x* positions.
- 'edges': Expects dim(x) = dim(y) + 1, steps have y[i] value on
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think len(x) instead of dim(x) here?

@timhoffm
Copy link
Member

I assume, if the fist y is at 50, there would be no vertical line y=0 to y=50? E.g. something like (not tested with your code)

grafik

Is that intentional? I could imagine many people would want that (optionally or by default). Otherwise this is not similar to a histogram.

@timhoffm
Copy link
Member

Closes #5855.

@andrzejnovak
Copy link
Contributor Author

andrzejnovak commented Aug 10, 2019

Yes, that's intentional based on the consensus in @6669. I would be ok with having both options, as we might want the above implementation to signify that we have no info about the rest of the distribution, but other might want the same results as with ax.hist(data, bins=x, histtype='step') just with preprocessed data.

Naively I would expect "edges" to actually refer to the second option with closed lines, so it would make more sense to call the above "bins" or just descriptively "between" (which is even a naturally sounding answer to where).

@anntzer
Copy link
Contributor

anntzer commented Aug 10, 2019

As a reminder, there's also #5855 where I suggested allowing both len(x) == len(y)-1 (the case here) and len(y) == len(x)+1 (with "edges", or altenatively this can be seen as a histogram flipped by 90° -- indeed, my intent was to make it easy to switch between one orientation and the other).

@andrzejnovak
Copy link
Contributor Author

Tbh not sure how to implement that, but also I don't really like the idea of implicitly flipping x and y.

@anntzer
Copy link
Contributor

anntzer commented Aug 10, 2019

It does not have to be implicit, but let's say I would prefer it not to be much harder to do a horizontal histogram rather than a vertical one.

@andrzejnovak
Copy link
Contributor Author

On that note, there seems to be some inconsistency on how np.inf is handled.

fig, ax = plt.subplots()
ax.step(np.r_[x, x[-1]],[-np.inf,*y, -np.inf], where='pre');
ax.set_ylim(1,50)
ax.set_yscale('log')

and

fig, ax = plt.subplots()
ax.step([*x, x[-1]],[-np.inf,*y, -np.inf], where='pre');
ax.set_ylim(1,50);

Either show or don't show the last edge.

@andrzejnovak
Copy link
Contributor Author

The issues with having the closing edges is that the only way to make it work seems if one could specify transform point by point. Right now it looks like it's possible to have x in data coords and y in axes coords, but I don't see how to do it for an line with coords [data, data] [data, axes]. If there's no better solution I'll make it default to a small nonzero number.

@anntzer
Copy link
Contributor

anntzer commented Aug 12, 2019

you can (should be able to... no guarantees) be able to make it zero and the log-scale handler will clip it itself to a small nonzero number.
but note that I am not suggesting to add closing edges yourself, instead they should come from the user (as the first/last y-values).

@andrzejnovak
Copy link
Contributor Author

fig, ax = plt.subplots()
ax.step(x, y, where='edges');

image

fig, ax = plt.subplots()
ax.step(x, y, where='edges')
ax.semilogy();

image

fig, ax = plt.subplots()
ax.step(x,y, where='between');

image

@tacaswell tacaswell added this to the v3.3.0 milestone Aug 12, 2019
@andrzejnovak
Copy link
Contributor Author

@anntzer I added a flipxy bool to do what you wanted in #5855, could you take a look if that implementation works for you?

@anntzer
Copy link
Contributor

anntzer commented Aug 13, 2019

I guess a variant would be to have orientation="horizontal"/"vertical" which is consistent with the rest of matplotlib (whereas flipxy exists nowhere else). Probably needs some discussion...

@andrzejnovak
Copy link
Contributor Author

I had that at first, but apart from being a handful to write I got confused which way was ''vertical" and which way was "horizontal". I don't have a strong feeling either way. Would be great to get a consensus, so I can start putting in tests.

@anntzer
Copy link
Contributor

anntzer commented Aug 13, 2019

The orientation would be the same as for hist(). I agree it's not clear this API would be best, just suggesting it.

@andrzejnovak
Copy link
Contributor Author

andrzejnovak commented Aug 13, 2019

@dstansby I've added tests, could you review and tell me what's missing?

@anntzer
Copy link
Contributor

anntzer commented Aug 13, 2019

In general I like the idea, but I won't have the time to properly review this in the coming days at least. I'm not claiming exclusive rights to this PR either :)

@anntzer
Copy link
Contributor

anntzer commented Aug 14, 2019

Although a quick thought is that after all, it may not be such a great idea to have an 'orientation' kwarg which only has an effect when 'where' has a certain value ("edges"), so perhaps instead having where="edges-horizontal"/"edges-vertical" (or some better name, but basically folding this into 'where') may be better.
Not really sure here, just some thoughts.

@andrzejnovak
Copy link
Contributor Author

Well it works for both new symmetric where options. I didn't modify the rest of them because its not clear to me what the behavior of pre and post should look like.

Tbh I'd be curious to know what fraction of step calls is not plotting histograms.

@anntzer
Copy link
Contributor

anntzer commented Aug 15, 2019

At least when I first proposed the idea in #5855 the goal was not histograms (see #5855 (comment) and #5855 (comment)).

@andrzejnovak
Copy link
Contributor Author

andrzejnovak commented Aug 15, 2019

Apart from the obvious I fail to see what's different about this comment and as implemented here where='between' (or that but horizontal). Feel free to put up an image of what you mean and I can try to work it in.

As for this comment we're planning to make something at least for mpl-hep of the signature histplot(values, bin_edges, histtype=<outline/filled/errors>) which will under the hood call step()/fill_between()/errorbar() or their combinations. If interested we'd be happy to work it into matplotlib itself. But this PR already should simplify our life quite a bit.

@ImportanceOfBeingErnest
Copy link
Member

I left my problem with this implementation as a comment to #5855.

f"{y.shape}")

if orientation == 'horizontal':
y = np.r_[y[0], y]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see #16142 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright., btw this PR is superseded by #15065, I should close it

@jklymak
Copy link
Member

jklymak commented Jan 7, 2020

See #15065

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants