Skip to content

make histtype='step' default #7121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
tlatorre-uchicago opened this issue Sep 15, 2016 · 17 comments
Closed

make histtype='step' default #7121

tlatorre-uchicago opened this issue Sep 15, 2016 · 17 comments

Comments

@tlatorre-uchicago
Copy link

I'm using matplotlib version 1.5.1 on Scientific Linux 6 (RHEL clone). I am using the matplotlib provided in the anaconda install.

I would like to propose that histograms are drawn by default with histtype='step'. I've been using matplotlib for a long time, and it is very common (at least in high energy physics) to plot histograms with a large number of bins, which using the default drawing behavior leads to very unresponsive plots. For example:

plt.hist(np.random.exponential(size=1000000),bins=10000)
plt.show()

produces a histogram which takes ~15 seconds to draw and takes somewhere between 5-10 seconds to redraw after panning or zooming. In contrast, plotting with histtype='step':

plt.hist(np.random.exponential(size=1000000),bins=10000,histtype='step')
plt.show()

plots almost instantaneously and can be panned and zoomed with no delay.

The step option also follows the least ink principle which Edward Tufte recommends in his book "The Visual Display of Quantitative Information".

At a minimum, maybe the step option could be included in the histogram_demo.py file. I'd be happy to do this if it's a welcome addition.

@efiring
Copy link
Member

efiring commented Sep 15, 2016

I agree that 'step' or 'stepfilled' makes good sense for a histogram. Are there any of our gallery examples or common use cases for which 'step' would work badly? It looks like 'step' is much better as a default than 'stepfilled' because the latter works badly when there is more than one series in the histogram.
Even if the default is not changed, the hist docstring could note the superior speed of 'step'.

@efiring efiring added this to the 2.0 (style change major release) milestone Sep 15, 2016
@anntzer
Copy link
Contributor

anntzer commented Sep 16, 2016

mwaskom/seaborn#704 / #4065 is an issue about the default appearance of step histograms when using the seaborn styles, although it could be argued this is not a problem of matplotlib itself.

@story645
Copy link
Member

I'd vote for it to be included in the demo, but against it being the default since the current default is what's shown in just about every basic stats/research methods book and so I think deviating from that would be massively confusing to a lot of people.

@efiring
Copy link
Member

efiring commented Sep 19, 2016

@tlatorre-uchicago, please proceed with a PR to add a 'step' hist example ASAP, and consider including a line somewhere in the hist docstring pointing out the superior speed of step when the number of bins is large. At this point in our development cycle, these changes are helpful and safe; changing this particular default now would evidently be controversial and perhaps undesirable to a major fraction of the user community.

@tlatorre-uchicago
Copy link
Author

Sure, can you recommend which histogram example I should edit? The example that pops up on Google when I search for "matplotlib histogram" is http://matplotlib.org/1.2.1/examples/pylab_examples/histogram_demo.html, but that doesn't appear to be in the documentation anymore.

@efiring
Copy link
Member

efiring commented Sep 19, 2016

The docs are in a state of flux, with a trend toward moving things from the pylab section to more specific sections. The latest version actually includes a step-filled example: http://matplotlib.org/devdocs/examples/statistics/histogram_demo_histtypes.html. Source code: https://github.com/matplotlib/matplotlib/blob/master/examples/statistics/histogram_demo_histtypes.py. Maybe we don't need more than that? Or do you want to add a plain step example?

@tacaswell
Copy link
Member

@tlatorre-uchicago You might also be interested in http://matplotlib.org/devdocs/examples/api/filled_step.html You will have to call np.histogram and step/fill_between, but you will get more control + Line2D objects.

@tlatorre-uchicago
Copy link
Author

Ah, yeah that looks good. I'm not sure what the best course of action is then. I guess there are a couple of options:

  • just add a note to the docstring (should it be in the Notes section at the bottom or in the main paragraph at the top?)
  • add a note to one of the existing examples (histogram_demo_features or histogram_demo_histtypes?)
  • add a new example for "histogramming large datasets" or something
  • add a warning if bins > 1000 and histtype='bar'

The last option seems a bit hacky from a purist standpoint, but pragmatically that would have alerted me to the problem the quickest.

I'm happy to do any/all of these. I guess the main objective should be to make sure whatever is changed that it's obvious to anyone in the future that plotting histograms in matplotlib is not slow if you use the histtype='step' argument.

@tacaswell
Copy link
Member

Adding a note to Notes and extending the histogram_demo_histtypes are probably the best two options.

👎 on adding a warning, it assumes too much about user intent.

As a side note, what group are you with at uchicago? I graduated from there a few years ago.

@story645
Copy link
Member

I think you can use #7096 as a model.

@tlatorre-uchicago
Copy link
Author

OK, I'll add a note and update the histtypes demo. I'm in Dr. Blucher's group working on SNO+.

@tlatorre-uchicago
Copy link
Author

I'm having some trouble building the HTML documentation.

  1. I have to run python make.py --allowsphinxerrors because there are a lot of sphinx errors. Is this expected?
  2. After building for a while I get the exception 'ascii' codec can't encode character u'\U0001f603' in position 16: ordinal not in range(128)

The exception comes from line 699 in plot_directive.py: code = textwrap.dedent("\n".join(map(str, content))). I can send the whole traceback if you want.

I believe I have all of the prerequisites. I've followed the instructions in doc/README.txt. Anything else I should know?

@tacaswell
Copy link
Member

Try building the docs with python 3.

There should be no sphinx warnings (which we test for on travis).

@tlatorre-uchicago
Copy link
Author

Tried with python 3, still got lots of warnings, but finally got it to compile using python make.py html --allowsphinxwarnings. However I don't see my edits showing up in the Notes section. Something is adding a new Notes section to the bottom of the docstring when the documentation is built. You can even see this on the development docs:

See the docstring here:

versus

http://matplotlib.org/2.0.0b4/api/pyplot_api.html#matplotlib.pyplot.hist

Inspecting with python shows:

>>> help(plt.hist)
[...]
    See also
    --------
    hist2d : 2D histograms

    Notes
    -----
    * Until numpy release 1.5, the underlying numpy histogram function was
      incorrect with `normed`=`True` if bin sizes were unequal.  MPL
      inherited that error.  It is now corrected within MPL when using
      earlier numpy versions.

    * Plotting histograms with a large number of bins can be exceptionally
      slow when using the default `histtype='bar'`. In these cases, it is
      much faster to plot using `histtype='step'`.

    Examples
    --------
    .. plot:: mpl_examples/statistics/histogram_demo_features.py

    Notes
    -----

    In addition to the above described arguments, this function can take a
    **data** keyword argument. If such a **data** argument is given, the
    following arguments are replaced by **data[<arg>]**:

    * All arguments with the following names: 'weights', 'x'.

@tacaswell
Copy link
Member

What warnings are you getting? Sometimes it helps to do a git clean -xfd.

There is already an issue for tracking the Note clobbering behavior: #7095

@efiring efiring modified the milestones: 2.0.1 (next bug fix release), 2.0 (style change major release) Oct 10, 2016
@QuLogic QuLogic modified the milestones: 2.0.1 (next bug fix release), 2.0.2 (next bug fix release) May 3, 2017
@tacaswell tacaswell modified the milestones: 2.1.1 (next bug fix release), 2.2 (next feature release) Oct 9, 2017
@QuLogic QuLogic modified the milestones: v2.2, v3.0 Feb 10, 2018
@tacaswell tacaswell removed this from the v3.0 milestone Aug 11, 2018
@anntzer
Copy link
Contributor

anntzer commented May 23, 2023

I suspect changing the default will not happen (other than with some kind of large backcompat-maintaining PR similar how #25427 does it for contour(), which is definitely doable, just a fair bit of work), and we should just try to add references towards stairs(*histogram()) (which is arguably better, though it certainly doesn't replace all use cases) throughout the docs.

@timhoffm
Copy link
Member

The documentation is updated. I close this (=change the default) as "not planned".

@timhoffm timhoffm closed this as not planned Won't fix, can't repro, duplicate, stale May 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants