Skip to content

Handle NaN in bar labels and error bars #22929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions lib/matplotlib/axes/_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -1083,10 +1083,10 @@ def hlines(self, y, xmin, xmax, colors=None, linestyles='solid',
lines._internal_update(kwargs)

if len(y) > 0:
minx = min(xmin.min(), xmax.min())
maxx = max(xmin.max(), xmax.max())
miny = y.min()
maxy = y.max()
minx = min(np.nanmin(xmin), np.nanmin(xmax))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are already combining/using masks above, would it make sense to add a masked_invalid() call earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really know how masks actually works (which is a bit ironically, considering my comment on your PR about masks I did just before seeing this...), although I realize that it may be related. Just noted that this didn't work correctly when trying out the other fixes.

I may give it a do though, either here or as a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 There is a cbook.safe_masked_invalid if that would help at all. Masking the invalid locations just ignores those in the computations (well the details are more involved, but that is the easy way of thinking about it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've played around with it and cannot get it to work. The reason seems to be that the array has dtype=object (although it is float when created) and therefore I get runtime warnings on min and max.

Using masked_invalid gives a masked array with a fill_value of ?, while cbook.safe_masked_invalid just returns an identical array.

If the array was float it would probably work though, but I have no idea where it is becoming dtype=object. Maybe related to datetime support?

(Not really clear to me why nanmin and nanmax is worse than adding masked conversions.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is necessarily anything wrong with it if we are expecting nan's for some reason. To me, this just seemed like it was potentially hiding something else, which as you've noted here this only happens for dtype=object. If you are passing in a float array, I really don't think we should be making it an object array 😟 Perhaps this is because these error bars are generally used for categoricals so something is converting to objects fast because of that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this because we don't want to drop units.

maxx = max(np.nanmax(xmin), np.nanmax(xmax))
miny = np.nanmin(y)
maxy = np.nanmax(y)

corners = (minx, miny), (maxx, maxy)

Expand Down Expand Up @@ -1162,10 +1162,10 @@ def vlines(self, x, ymin, ymax, colors=None, linestyles='solid',
lines._internal_update(kwargs)

if len(x) > 0:
minx = x.min()
maxx = x.max()
miny = min(ymin.min(), ymax.min())
maxy = max(ymin.max(), ymax.max())
minx = np.nanmin(x)
maxx = np.nanmax(x)
miny = min(np.nanmin(ymin), np.nanmin(ymax))
maxy = max(np.nanmax(ymin), np.nanmax(ymax))

corners = (minx, miny), (maxx, maxy)
self.update_datalim(corners)
Expand Down Expand Up @@ -2674,7 +2674,7 @@ def sign(x):
extrema = max(x0, x1) if dat >= 0 else min(x0, x1)
length = abs(x0 - x1)

if err is None:
if err is None or np.size(err) == 0:
endpt = extrema
elif orientation == "vertical":
endpt = err[:, 1].max() if dat >= 0 else err[:, 1].min()
Expand Down Expand Up @@ -3504,7 +3504,9 @@ def apply_mask(arrays, mask): return [array[mask] for array in arrays]
f"'{dep_axis}err' (shape: {np.shape(err)}) must be a "
f"scalar or a 1D or (2, n) array-like whose shape matches "
f"'{dep_axis}' (shape: {np.shape(dep)})") from None
if np.any(err < -err): # like err<0, but also works for timedelta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check for the nan in the conditional?
(err < -err) & ~np.isnan(err)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnan doesn't work for dtype=object https://stackoverflow.com/questions/36198118/np-isnan-on-arrays-of-dtype-object and casting to float will break timedelta.

There may be a better way to write it, but that is the reason for the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe & err != err?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there will be runtime warnings that will fail the tests.

err = np.array([np.nan, -0.1], dtype=object)
np.any(err < -err)

gives

/tmp/ipykernel_32424/1840248036.py:1: RuntimeWarning: invalid value encountered in less
  np.any(err < -err)

independent of what is added (unless something like err != err is added before, but then it will not test properly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It took me quite some time to get a solution that worked... Not that it may be a good indication, but at least I failed with quite a few alternatives...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is numpy? Although with dtype=object. Or am I missing something?

Yeah, but dtype object isn't guaranteed to do anything under mathematical operations (of particular relevance < is not guaranteed to work). Its basically just a list.

The PR solves #22910 without introducing any new issues (as far as I can tell, all existing tests pass anyway). Although it may be that lots of other related things also should be fixed.

Fair, maybe this is a tangent of sorts, but I think we need to fix the underlying problems rather than add more bandaids on top.

It would really help to get some examples of things that doesn't work though.

I'm 99% sure that if you passed a unit-full pint object in as err this would probably fail. Maybe it failed before as well?

I'm not blocking on this, but just pointing out that we still have substantial work to do on cleaning up the engineering here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is bandaid for sure.

Thanks! Wasn't aware of pint (although I've seen it mentioned). This is what happens on both main and this when using errorbar:

/local/data1/matplotlib/lib/matplotlib/axes/_axes.py:3329: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  y = np.asarray(y, dtype=object)
/local/data1/matplotlib/lib/matplotlib/axes/_axes.py:3374: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.asarray(err, dtype=object)

image

and when using bar:

/local/data1/miniconda3/lib/python3.8/site-packages/numpy/lib/stride_tricks.py:536: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  args = [np.array(_m, copy=False, subok=subok) for _m in args]
Traceback (most recent call last):

  File "/local/data1/miniconda3/lib/python3.8/site-packages/pint/quantity.py", line 901, in __float__
    raise DimensionalityError(self._units, "dimensionless")

DimensionalityError: Cannot convert from 'meter' to 'dimensionless'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/tmp/ipykernel_13475/2206117657.py", line 1, in <cell line: 1>
    plt.bar([1, 2], base, yerr=err)

  File "/local/data1/matplotlib/lib/matplotlib/pyplot.py", line 2392, in bar
    return gca().bar(

  File "/local/data1/matplotlib/lib/matplotlib/__init__.py", line 1412, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)

  File "/local/data1/matplotlib/lib/matplotlib/axes/_axes.py", line 2369, in bar
    x, height, width, y, linewidth, hatch = np.broadcast_arrays(

  File "<__array_function__ internals>", line 5, in broadcast_arrays

  File "/local/data1/miniconda3/lib/python3.8/site-packages/numpy/lib/stride_tricks.py", line 536, in broadcast_arrays
    args = [np.array(_m, copy=False, subok=subok) for _m in args]

  File "/local/data1/miniconda3/lib/python3.8/site-packages/numpy/lib/stride_tricks.py", line 536, in <listcomp>
    args = [np.array(_m, copy=False, subok=subok) for _m in args]

ValueError: setting an array element with a sequence.

(empty plot).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, so the point is you are definitelynot making it worse here, but the problem still remains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that a quick bandaid for bar and units is to use this approach there as well:

# Casting to object arrays preserves units.
if not isinstance(x, np.ndarray):
x = np.asarray(x, dtype=object)
if not isinstance(y, np.ndarray):
y = np.asarray(y, dtype=object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is what I'm fundamentally objecting to - just making an object array doesn't make any sense!

res = np.zeros_like(err, dtype=bool) # Default in case of nan
if np.any(np.less(err, -err, out=res, where=(err == err))):
# like err<0, but also works for timedelta and nan.
raise ValueError(
f"'{dep_axis}err' must not contain negative values")
# This is like
Expand Down
20 changes: 20 additions & 0 deletions lib/matplotlib/tests/test_axes.py
Original file line number Diff line number Diff line change
Expand Up @@ -7549,6 +7549,26 @@ def test_bar_label_nan_ydata_inverted():
assert labels[0].get_va() == 'bottom'


def test_nan_barlabels():
fig, ax = plt.subplots()
bars = ax.bar([1, 2, 3], [np.nan, 1, 2], yerr=[0.2, 0.4, 0.6])
labels = ax.bar_label(bars)
assert [l.get_text() for l in labels] == ['', '1', '2']
assert np.allclose(ax.get_ylim(), (0.0, 3.0))

fig, ax = plt.subplots()
bars = ax.bar([1, 2, 3], [0, 1, 2], yerr=[0.2, np.nan, 0.6])
labels = ax.bar_label(bars)
assert [l.get_text() for l in labels] == ['0', '1', '2']
assert np.allclose(ax.get_ylim(), (-0.5, 3.0))

fig, ax = plt.subplots()
bars = ax.bar([1, 2, 3], [np.nan, 1, 2], yerr=[np.nan, np.nan, 0.6])
labels = ax.bar_label(bars)
assert [l.get_text() for l in labels] == ['', '1', '2']
assert np.allclose(ax.get_ylim(), (0.0, 3.0))


def test_patch_bounds(): # PR 19078
fig, ax = plt.subplots()
ax.add_patch(mpatches.Wedge((0, -1), 1.05, 60, 120, 0.1))
Expand Down