-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
A reasonable default width for bar() with units? #13236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think this is reasonable, but a bit mysterious to only apply to unitized data. Doesn’t obviate the need for the other PRs, but may simplify them somewhat. |
Good catch, A fixed "deunitized default" does not make sense. However, the topic is a bit complex. IMHO the unit-free numeric values corresponding to unitized data is semi-public. For simple cases, people should not have to know about it (i.e. plotting bars based on categoricals or dates). However, we cannot completely hide it, because people might want to draw additional elements to the Axes and thus might need to know about the actual data-space of the Axes. Some general thoughts:
|
The paradigm for units is to allow the converter to decide if it likes the non-unitized float and what to do with it. |
The problem of having "auto" mean 0.8*(the mean spacing) for unitized data but otherwise requiring unitized widths for unitized data is that this does not allow adjusting the bar widths for categorical data, because for those you can really only do the computation afer deunitizing (an my guess is that quite a few people want to use width=1 in that case). |
See #13254 for an implementation of the proposal. |
I see the appeal of using a relative meaning for width, but I'm really not wild about the idea that for this one corner (call it 25% of options) it means relative width, but for the rest of cases it means absolute. This kind of guessing is what usually gets us into long term maintenance troubles. While I get purity arguments here, do we have a test case that actually shows problematic behavior in the current implementation WRT units and bar width? I get it might be suboptimal, but is it bone-headedly stupid? |
You can’t do datetime + float. It fails right now on 3.0.2 unless you specify width in timedelta |
Btw I agree that if we change to a relative default for some cases it should be relative for all cases. |
The current default (as of master) is basically a bar of 0.8 days, because datetimes are converted to scalars on a 1-day scale. So you'll get something reasonably nice looking if your x's are ~1 day apart, but not if they're ~1y apart. For scalars, I guess that's typically not a problem because most bar plots effectively use the x-axis as ordinals (and thus consecutive integers)? |
Well, unit support enables doing the same operation across values given with different scaling/offsets. What you're proposing is doing a completely different operation by default. IIUC:
While it could make for better default plotting out of the box, it means some users will be confused by the change in the plot when they switch from plotting on a date time axis to one on plain floats. I'm not full out against it, but it does make me pause. And it does seem to be against the general grain of matplotlib development later where we're moving away from such magical, polymorphic behavior. |
I guess I just have to accept I fundamentally don't understand what is the general philosophy behind unit support in Matplotlib (we already had this debate some time ago and frankly, every time I try to understand it, I fail... no need to relitigate the whole thing here); in the meantime, just documenting the actual behavior (0.8-of-whatever-your-units-converts-to) seems good enough. |
Are we sure this is not release critical? 3.1 will include the changes for unit-handling from #12903. I'm not recalling the complete discussion anymore, but would it make sense to require a width argument for unitized data (maybe except categoricals, which are essentially integer units and can do with the global default of 0.8). That would at least be a clean API and error out with a reasonable error message. We can always add more clever defaults later on. |
Ping @anntzer @jklymak @tacaswell concerning my previous comment. A side effect of #12903 is that unitized x values will generate a working plot in 3.1. However, this uses the default bar width of 0.8 which is in generally not suitable. E.g.
will result in extremely small bars: Is that behavior ok for 3.1 or do we have to do something about it (a general default seems difficult, so the best option might be to error out for unitized-x without width. |
I prefer this behaviour compared to an error. For better for worse 0.8 is a valid matplotlib datetime according to the converter. |
Honestly, I don't really care anymore at that point because (per my comment above) I just don't understand what is the coherent plan behind unit support; as long as it doesn't make unitless plotting more difficult I'm fine with everything. |
Maybe, overall this is not too critical. We're going from an error to a bad default, which people will likely not rely on. So it's still realtively harmless to change to a better default or raise an error again in future versions. |
I don't think this errored before, did it? What errored was specifying width as a timedelta... |
There doesn't seem to be any difference between unsuitable data with and without units. You have the same problem of unsuitable bar widths when plotting |
@ImportanceOfBeingErnest, good point. Not sure what we should do about it or the history behind the style change. |
A simple approximate solution would be to check dx / width. If that‘s larger than a certain factor, say 200, issue a warning. |
Can we do something about this? I just ran the following:
and got what I thought was a blank plot; I spent the next 25 minutes trying to figure out what I did wrong. In my view, the sensible default would be 0.8 x the distance between each bar, even if the bars are different distances apart. Then if the user thinks it's ugly he can set the bar width (or height in the case of |
Maybe the minimal thing we should do here is to set the edge color on
has all of the bars visible. |
diff --git a/lib/matplotlib/axes/_axes.py b/lib/matplotlib/axes/_axes.py
index 40e48fd55..7354ba152 100644
--- a/lib/matplotlib/axes/_axes.py
+++ b/lib/matplotlib/axes/_axes.py
@@ -2275,7 +2275,7 @@ class Axes(_AxesBase):
color = kwargs.pop('color', None)
if color is None:
color = self._get_patches_for_fill.get_next_color()
- edgecolor = kwargs.pop('edgecolor', None)
+ edgecolor = kwargs.pop('edgecolor', color)
linewidth = kwargs.pop('linewidth', None)
# Because xerr and yerr will be passed to errorbar, most dimension is enough to make that change, but it changes a lot of test images. |
Changing the style just to work around this issue feels wrong. It would be sort of ok to enable edges globally, but not as a workaround in a single function. Moreover, the result is most likely still not what the user wants. So this workaround would only be an indirect way of hinting what is going wrong. In descending priority a fix should be:
|
I think trying to detect zero-width draws is going to be both very expensive computationally and very hard to actually get right. For the vector outputs we will just output the points to draw and will be at the mercy of the final renderer to determine how many pixels will be on the screen. Because the user can change the dpi / axes size / figure size / axes limits, how much of data space is covered by a pixel can be changed wildly between when we create the Maybe we could warn from inside our Agg wrappers? |
For raster output this can probably be done without too much difficulty within the path snapper. Or we can just not snap paths in that case... |
Something to consider is that there are presumable lots of cases where you do want to have a patch not being drawn if it is less than a pixel, and not get a warning. E.g. a map with the world's countries as patches shouldn't give a warning by default, even though it contains the Principality of Monaco. |
I would suggest only doing so when the snapping causes the path to exactly come back on itself (while the path was not coming back over itself in the absence of snapping). |
Personally think the way forward is a width=“auto” that tries to do the best it can and then an rcParam to set the default width. |
I would appreciate an easy way to get the unit used by width, after a plot has been made, when handling time series data. I had this issue yesterday, when plotting data with timestamps 10 minutes apart. Matplotlib bar evidently sees the unit as days, something I had to trial and error to find out, guessing my way to a correct width. |
You should be able to set the width with a timedelta or Timedelta64. It’s not trivial to explain how bar widths interact with all the units If you had a suggestion for where it would have been helpful to see this in the docs that would be very helpful |
An advantage of the currently implemented way is that it's totally deterministic. Units are always data units. And width is 0.8 data units, unless changed by the user. The same is actually true for pandas (though it might be 0.6 instead of 0.8), but the confusing bit with pandas is that barplot units are categorical: each bar is 1 unit away from the next, independent on what the formatter says. In that sense pandas plots might result in completely confusing output when an unequally spaced timeseries is plotted, such that bars are equal in width in display space, but look unequal in width in data space because the formatter suggests different axis units. |
In master, after #12903 the current default is a bare 0.8 float with no unit information, not 0.8 in "data units" (I don't think it was ever 0.8 in "data units"). If x0 has units, and If anyone wants to have a unitized bar that does something else, i.e. |
We call the units of the axis scale "data" throughout, e.g. |
That is confusing though, because an axis has "units" that get converted by the axis' unit converter to floats. I believe what you are referring to are data co-ordinates. |
I'm no native speaker, but I believe you would call the distance between 2 and 1 on an axis "1 (data) unit" not "1 (data) coordinate"? |
Just to clarify: The problem with our units handling as pointed out by @snowzky, is that at the moment is that the user has no way to know what "0.8" means relative to the units they are plotting in. i.e. imagine if units were lengths, and they passed in I'm not sure what the way forward is. Making the width the median distance between |
I'm tempted to go with a new default
I assume that all use-cases that use the current default and that would be affected by the proposed change are more or less heavily broken with too small or broad bars. Breaking them is a penalty we may consider bearable. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
Thanks to the ping from the bot, I tested my example from above today with version 3.7.1: from matplotlib import pyplot as plt
x = [3858500.0, 3858000.0, 3857500.0, 3857000.0, 3856500.0, 3856000.0, 3855500.0, 3855000.0, 3854500.0, 3854000.0]
y = [105.0, 205.0, 130.0, 190.0, 145.0, 155.0, 125.0, 160.0, 115.0, 170.0]
plt.barh(x, y)
plt.show() Are we satisfied with this? The bars are visible now, which means people like me who are confused why their plot is blank will have a better idea of what's happening. But obviously it's an unusable default. |
I thon @timhoffm 'auto' proposal is a good one. Someone just needs to implement it in a reasonable way. |
In #12903 we merged a PR fixing the handling of the default
width
(0.8) whenbar
(barh
,broken_barh
) is called with unitized data (I was among the PR approvers; see also #13187 as followup). Briefly, the idea is to support both unitless and unitized widths, so that one can doThe main trickiness is that the default width is unitless (0.8), but one can also pass a unitful width, so it's not clear how to add it to the unitized
x
; moreover, note that the unit ofx
can either correctly handle addition (time + timedelta), or not (categoricals, which need to go through the converter first). So the end design is something likewhere orig_x and orig_dx are the user inputs, either unitized or deunitized.
However, I now feel like this is not really looking at the problem under the right angle: if the user input is unitized, what's the chance that 0.8 "deunitized-units" is a reasonable default width? It is rather low, I would guess. Instead, it would seem that the intent of the default would/should be "0.8 times the spacing between the bars" (in the common case of evenly spaced bars).
Hence, a solution may perhaps be to change the interpretation of a unitless
width
in the case of unitized data, to have it mean (e.g.) a fraction of the (mean) spacing between the bars (i.e.(max(x) - min(x)) / (n - 1)
). Note that becausebar
did not support unitized data up to now, there are no backcompat concerns.Mostly @jklymak I guess. Allowing myself to mark this as release critical mostly so that we don't have to go through a painful deprecation period if we agree on this, but feel free to untag.
The text was updated successfully, but these errors were encountered: