-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Add hist.histtype rcParam. #12493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add hist.histtype rcParam. #12493
Conversation
histtype='stepfilled' is significantly faster than the current default ('bar') for large numbers of bins, but changing the default would cause backcompat problems. The 'hist.histtype' rcParam is introduced with the explicit intent of preparing this transition (later making 'stepfilled' (or perhaps 'step') the default and then deprecating the rcParam, to avoid neverending growth of the rcParam list).
👍 on adding the rcParameter. Equally, a 👎 on changing the default to step or stepfilled. There is no use in plotting 10000 bars, but neither 10000 steps. For any reasonable number of bins, the bars are fast enough. If the bin width is smaller than what is resolvable on a printer or monitor, people can just use a line plot. |
Actually, I'm against the uncontrolled multiplication of rcParams (that's how you end up with 36(!) rcParams just for boxplot()...).
Actually, just today :) I was looking at quantization artefacts on a 14 bit camera over long movie stacks, so I was actually plotting a 16384-bin histogram (and then zooming over the relevant parts...) (and that's typically a case where you explicitly don't want to look at a KDE) (amazingly, a bunch of the PRs I submitted recently are actually directly related to my work...). Running
shows that "bar" plots this in ~2.5s, whereas "stepfilled" takes ~50ms. That's not a small gain.
Well, that's exactly what 'step' does... |
So you are saying that typing the 36 boxplot parameters don't hurt anyone. Not having the one parameter you need, e.g. for the case where you would like to use a stepped line, can however be very frustrating. |
@@ -481,6 +481,7 @@ | |||
#hist.bins : 10 ## The default number of histogram bins. | |||
## If Numpy 1.11 or later is | |||
## installed, may also be `auto` | |||
#hist.histtype : bar ## The default histtype. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to list the options here
## The default histtype. One of 'bar', 'barstacked', 'step', 'stepfilled'.
We could actually add another histtype (e.g. "barstepfilled") which still draw the patch as a single line, but with a longer-winded line that actually takes care of going over the edges of all bars. Then you get the best of both worlds.
Clearly you haven't attempted to refactor rcsetup.py...
Why exactly can't you use
If you think it's a convenience wrapper around bar(), then you're stuck with slow plotting. If you look at it as a wrapper to generate barstepfilled plots (as proposed above), it's not that trivial... |
I think adding the rcparam is fine. I’m not personally a fan of bar plot histograms and wouldn’t object to the default being changed but my objection is aesthetic rather than performance related. I think performance is not a strong argument because anyone who has 16,000 bins can be smart enough to set the right kwarg. |
This is why I think histograms should be "bars": Some replies on the above (hidden to not distract from the main issue here)
I meant a convenience wrapper around
Please remember that matplotlib is also a teaching tool. From a scientist we can assume to have read the documentation and, if speed matters to them, use
If I know from the beginning that I need a step line, I might do that (though I would probably opt for
If the actual number of parameters makes rcsetup.py hard to refactor, that would be a good indication for it to be in need of a refactor. |
Again, that's possible using the "single-line" drawing method. Right now I'm not volunteering to implement this though, especially as (sadly, as usual...) there are combos of kwargs for which the expected behavior is unclear (#12494).
I don't use hexbin myself, but looking at its docstring:
so setting
It would be nice if we could have both speed and usability. I am always in favor of making things accessible, but users should (if possible) not have to jump through hoops to get performant code.
#9528 for the relevant issue. |
It's probably better to just retrain muscle memory to use |
PR Summary
histtype='stepfilled' is significantly faster than the current default
('bar') for large numbers of bins, but changing the default would cause
backcompat problems.
The 'hist.histtype' rcParam is introduced with the explicit intent of
preparing this transition (later making 'stepfilled' (or perhaps 'step')
the default and then deprecating the rcParam, to avoid neverending
growth of the rcParam list).
(The first commit is just docstring reformatting, the interesting part is the second commit.)
See discussion in #7121.
PR Checklist