Skip to content

Add hist.histtype rcParam. #12493

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from
Closed

Add hist.histtype rcParam. #12493

wants to merge 2 commits into from

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Oct 11, 2018

PR Summary

histtype='stepfilled' is significantly faster than the current default
('bar') for large numbers of bins, but changing the default would cause
backcompat problems.

The 'hist.histtype' rcParam is introduced with the explicit intent of
preparing this transition (later making 'stepfilled' (or perhaps 'step')
the default and then deprecating the rcParam, to avoid neverending
growth of the rcParam list).

(The first commit is just docstring reformatting, the interesting part is the second commit.)
See discussion in #7121.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

histtype='stepfilled' is significantly faster than the current default
('bar') for large numbers of bins, but changing the default would cause
backcompat problems.

The 'hist.histtype' rcParam is introduced with the explicit intent of
preparing this transition (later making 'stepfilled' (or perhaps 'step')
the default and then deprecating the rcParam, to avoid neverending
growth of the rcParam list).
@ImportanceOfBeingErnest
Copy link
Member

👍 on adding the rcParameter. Equally, a lines.drawstyle rcParams would be useful.

👎 on changing the default to step or stepfilled. There is no use in plotting 10000 bars, but neither 10000 steps. For any reasonable number of bins, the bars are fast enough. If the bin width is smaller than what is resolvable on a printer or monitor, people can just use a line plot. hist is merely a convenience wrapper, which is very functional for educational reasons. It is in this context quite unfortunate that the default patches do not have any dark edgecolor any more, but that can be corrected via rcParams. I sincerely hope that high-energy physicists do a bit more with their raw data then just plotting it via hist, so this special group who may be in need of thousands of bins (without directly depicting a kernel density estimate) can be expected to type one parameter more.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 11, 2018

Equally, a lines.drawstyle rcParams would be useful.

Actually, I'm against the uncontrolled multiplication of rcParams (that's how you end up with 36(!) rcParams just for boxplot()...).
Especially in the case of lines.drawstyle, I doubt that people are switching from "default" to "steps" that often as a style change.

There is no use in plotting 10000 bars, but neither 10000 steps.

Actually, just today :) I was looking at quantization artefacts on a 14 bit camera over long movie stacks, so I was actually plotting a 16384-bin histogram (and then zooming over the relevant parts...) (and that's typically a case where you explicitly don't want to look at a KDE) (amazingly, a bunch of the PRs I submitted recently are actually directly related to my work...).

Running

from pylab import *
from time import perf_counter
rcdefaults()
t = np.random.randn(100000)
for histtype in ["bar", "stepfilled"]:
    fig, ax = plt.subplots()
    ax.hist(t, 16384, histtype=histtype)
    for _ in range(3):
        __start = perf_counter()
        fig.canvas.draw()
        print(perf_counter() - __start)
    plt.close("all")

shows that "bar" plots this in ~2.5s, whereas "stepfilled" takes ~50ms. That's not a small gain.

people can just use a line plot.

Well, that's exactly what 'step' does...

@ImportanceOfBeingErnest
Copy link
Member

So you are saying that typing the histtype="stepfilled" is too much to be asked for a scientist, but it's ok to let all the students wondering why on earth matplotlib would be the only plotting program that doesn't show histograms as bars.

36 boxplot parameters don't hurt anyone. Not having the one parameter you need, e.g. for the case where you would like to use a stepped line, can however be very frustrating.
Introducing hist.histtype, but not lines.drawstyle means that we value hist higher than plot. That should not happen, given that plot is at the heart of matplotlib functionality, and hist being just a convenience wrapper.

@@ -481,6 +481,7 @@
#hist.bins : 10 ## The default number of histogram bins.
## If Numpy 1.11 or later is
## installed, may also be `auto`
#hist.histtype : bar ## The default histtype.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to list the options here

## The default histtype. One of 'bar', 'barstacked', 'step', 'stepfilled'.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 11, 2018

So you are saying that typing the histtype="stepfilled" is too much to be asked for a scientist, but it's ok to let all the students wondering why on earth matplotlib would be the only plotting program that doesn't show histograms as bars.

We could actually add another histtype (e.g. "barstepfilled") which still draw the patch as a single line, but with a longer-winded line that actually takes care of going over the edges of all bars. Then you get the best of both worlds.
Please remember that matplotlib is not just a teaching tool.

36 boxplot parameters don't hurt anyone.

Clearly you haven't attempted to refactor rcsetup.py...

Not having the one parameter you need, e.g. for the case where you would like to use a stepped line, can however be very frustrating.
Introducing hist.histtype, but not lines.drawstyle means that we value hist higher than plot.

Why exactly can't you use step() instead of plot()?

That should not happen, given that plot is at the heart of matplotlib functionality, and hist being just a convenience wrapper.

If you think it's a convenience wrapper around bar(), then you're stuck with slow plotting. If you look at it as a wrapper to generate barstepfilled plots (as proposed above), it's not that trivial...

@jklymak
Copy link
Member

jklymak commented Oct 11, 2018

I think adding the rcparam is fine. I’m not personally a fan of bar plot histograms and wouldn’t object to the default being changed but my objection is aesthetic rather than performance related. I think performance is not a strong argument because anyone who has 16,000 bins can be smart enough to set the right kwarg.

@ImportanceOfBeingErnest
Copy link
Member

This is why I think histograms should be "bars":

image

Some replies on the above (hidden to not distract from the main issue here)

If you think it's a convenience wrapper around bar()...

I meant a convenience wrapper around np.histogram followed by whichever plotting function you like.
In that sense a plotting function, which is capable of filling below a step line may be beneficial. It's always a bit problematic if some specialized function is capable of doing things you cannot easily do with the basic functions (e.g. hexbin can create hexagonal gridded plots of accumulated cartesian data, but matplotlib has no functionality to plot data which is already defined on a hexagonal grid).

Please remember that matplotlib is not just a teaching tool.

Please remember that matplotlib is also a teaching tool. From a scientist we can assume to have read the documentation and, if speed matters to them, use histtype="step*". I guess I forgot to mention, also 👍 for adding the respective sentence in the docstring. That is sure very useful.

Why exactly can't you use step() instead of plot()?

If I know from the beginning that I need a step line, I might do that (though I would probably opt for plot(.., drawstyle="steps-pre"), which is merely a question of personal taste). rcParams are useful to change the parameters without altering the actual code. Real world use case: Plotting spectrogram data; some people prefer it in a "binned" style, others in a "points-connected-by-lines" style.

Clearly you haven't attempted to refactor rcsetup.py...

If the actual number of parameters makes rcsetup.py hard to refactor, that would be a good indication for it to be in need of a refactor.

@anntzer
Copy link
Contributor Author

anntzer commented Oct 12, 2018

This is why I think histograms should be "bars":

Again, that's possible using the "single-line" drawing method. Right now I'm not volunteering to implement this though, especially as (sadly, as usual...) there are combos of kwargs for which the expected behavior is unclear (#12494).

(e.g. hexbin can create hexagonal gridded plots of accumulated cartesian data, but matplotlib has no functionality to plot data which is already defined on a hexagonal grid).

I don't use hexbin myself, but looking at its docstring:

    If *C* is *None*
    (the default), this is a histogram of the number of occurrences
    of the observations at (x[i],y[i]).
    
    If *C* is specified, it specifies values at the coordinate
    (x[i], y[i]). These values are accumulated for each hexagonal
    bin and then reduced according to *reduce_C_function*, which
    defaults to `numpy.mean`. (If *C* is specified, it must also
    be a 1-D sequence of the same length as *x* and *y*.)

so setting C appears to do what you want.

if speed matters to them, use histtype="step*"

It would be nice if we could have both speed and usability. I am always in favor of making things accessible, but users should (if possible) not have to jump through hoops to get performant code.
I guess the optimal solution (if we're willing to switch back to black-outline bars as default, which, as a default-change, is another backcompat problem) would be to implement the fast-singleline-stepfille-with-edges method as proposed above, and ultimately make it the default. After all, I doubt that beginners are fiddling with properties on individual patches of a histogram.

If the actual number of parameters makes rcsetup.py hard to refactor, that would be a good indication for it to be in need of a refactor.

#9528 for the relevant issue.

@jklymak jklymak modified the milestones: v3.1.0, v3.2.0 Feb 27, 2019
@tacaswell tacaswell modified the milestones: v3.2.0, v3.3.0 Sep 5, 2019
@QuLogic QuLogic modified the milestones: v3.3.0, v3.4.0 May 2, 2020
@QuLogic QuLogic modified the milestones: v3.4.0, v3.5.0 Jan 22, 2021
@jklymak jklymak marked this pull request as draft March 27, 2021 17:48
@QuLogic QuLogic removed this from the v3.5.0 milestone Aug 21, 2021
@QuLogic QuLogic added this to the v3.6.0 milestone Aug 21, 2021
@timhoffm timhoffm modified the milestones: v3.6.0, unassigned Apr 30, 2022
@story645 story645 modified the milestones: unassigned, needs sorting Oct 6, 2022
@anntzer
Copy link
Contributor Author

anntzer commented May 23, 2023

It's probably better to just retrain muscle memory to use plt.stairs(*np.histogram(...)) instead.

@anntzer anntzer closed this May 23, 2023
@anntzer anntzer deleted the histtype branch May 23, 2023 09:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants