Skip to content

Handle NaN in bar labels and error bars #22929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 24, 2022

Conversation

oscargus
Copy link
Member

@oscargus oscargus commented Apr 28, 2022

PR Summary

Closes #22910

Also handles nan in vlines and hlines better. Earlier it "worked", but with a run-time warning. The bounding box was not scale correctly though.

Not sure if the way bounds are tested here is the best possible.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • [N/A] New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • [N/A] Documentation is sphinx and numpydoc compliant (the docs should build without error).

@oscargus oscargus marked this pull request as draft April 28, 2022 16:00
@oscargus oscargus force-pushed the nanbarlabel branch 2 times, most recently from 3a195da to 7ada6b8 Compare April 30, 2022 10:19
@oscargus oscargus marked this pull request as ready for review April 30, 2022 10:25
@@ -3504,7 +3504,9 @@ def apply_mask(arrays, mask): return [array[mask] for array in arrays]
f"'{dep_axis}err' (shape: {np.shape(err)}) must be a "
f"scalar or a 1D or (2, n) array-like whose shape matches "
f"'{dep_axis}' (shape: {np.shape(dep)})") from None
if np.any(err < -err): # like err<0, but also works for timedelta.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you check for the nan in the conditional?
(err < -err) & ~np.isnan(err)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isnan doesn't work for dtype=object https://stackoverflow.com/questions/36198118/np-isnan-on-arrays-of-dtype-object and casting to float will break timedelta.

There may be a better way to write it, but that is the reason for the current approach.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe & err != err?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, there will be runtime warnings that will fail the tests.

err = np.array([np.nan, -0.1], dtype=object)
np.any(err < -err)

gives

/tmp/ipykernel_32424/1840248036.py:1: RuntimeWarning: invalid value encountered in less
  np.any(err < -err)

independent of what is added (unless something like err != err is added before, but then it will not test properly).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(It took me quite some time to get a solution that worked... Not that it may be a good indication, but at least I failed with quite a few alternatives...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it is numpy? Although with dtype=object. Or am I missing something?

Yeah, but dtype object isn't guaranteed to do anything under mathematical operations (of particular relevance < is not guaranteed to work). Its basically just a list.

The PR solves #22910 without introducing any new issues (as far as I can tell, all existing tests pass anyway). Although it may be that lots of other related things also should be fixed.

Fair, maybe this is a tangent of sorts, but I think we need to fix the underlying problems rather than add more bandaids on top.

It would really help to get some examples of things that doesn't work though.

I'm 99% sure that if you passed a unit-full pint object in as err this would probably fail. Maybe it failed before as well?

I'm not blocking on this, but just pointing out that we still have substantial work to do on cleaning up the engineering here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it is bandaid for sure.

Thanks! Wasn't aware of pint (although I've seen it mentioned). This is what happens on both main and this when using errorbar:

/local/data1/matplotlib/lib/matplotlib/axes/_axes.py:3329: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  y = np.asarray(y, dtype=object)
/local/data1/matplotlib/lib/matplotlib/axes/_axes.py:3374: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  return np.asarray(err, dtype=object)

image

and when using bar:

/local/data1/miniconda3/lib/python3.8/site-packages/numpy/lib/stride_tricks.py:536: UnitStrippedWarning: The unit of the quantity is stripped when downcasting to ndarray.
  args = [np.array(_m, copy=False, subok=subok) for _m in args]
Traceback (most recent call last):

  File "/local/data1/miniconda3/lib/python3.8/site-packages/pint/quantity.py", line 901, in __float__
    raise DimensionalityError(self._units, "dimensionless")

DimensionalityError: Cannot convert from 'meter' to 'dimensionless'


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "/tmp/ipykernel_13475/2206117657.py", line 1, in <cell line: 1>
    plt.bar([1, 2], base, yerr=err)

  File "/local/data1/matplotlib/lib/matplotlib/pyplot.py", line 2392, in bar
    return gca().bar(

  File "/local/data1/matplotlib/lib/matplotlib/__init__.py", line 1412, in inner
    return func(ax, *map(sanitize_sequence, args), **kwargs)

  File "/local/data1/matplotlib/lib/matplotlib/axes/_axes.py", line 2369, in bar
    x, height, width, y, linewidth, hatch = np.broadcast_arrays(

  File "<__array_function__ internals>", line 5, in broadcast_arrays

  File "/local/data1/miniconda3/lib/python3.8/site-packages/numpy/lib/stride_tricks.py", line 536, in broadcast_arrays
    args = [np.array(_m, copy=False, subok=subok) for _m in args]

  File "/local/data1/miniconda3/lib/python3.8/site-packages/numpy/lib/stride_tricks.py", line 536, in <listcomp>
    args = [np.array(_m, copy=False, subok=subok) for _m in args]

ValueError: setting an array element with a sequence.

(empty plot).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair, so the point is you are definitelynot making it worse here, but the problem still remains.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess that a quick bandaid for bar and units is to use this approach there as well:

# Casting to object arrays preserves units.
if not isinstance(x, np.ndarray):
x = np.asarray(x, dtype=object)
if not isinstance(y, np.ndarray):
y = np.asarray(y, dtype=object)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that is what I'm fundamentally objecting to - just making an object array doesn't make any sense!

maxx = max(xmin.max(), xmax.max())
miny = y.min()
maxy = y.max()
minx = min(np.nanmin(xmin), np.nanmin(xmax))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are already combining/using masks above, would it make sense to add a masked_invalid() call earlier?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not really know how masks actually works (which is a bit ironically, considering my comment on your PR about masks I did just before seeing this...), although I realize that it may be related. Just noted that this didn't work correctly when trying out the other fixes.

I may give it a do though, either here or as a follow-up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 There is a cbook.safe_masked_invalid if that would help at all. Masking the invalid locations just ignores those in the computations (well the details are more involved, but that is the easy way of thinking about it).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've played around with it and cannot get it to work. The reason seems to be that the array has dtype=object (although it is float when created) and therefore I get runtime warnings on min and max.

Using masked_invalid gives a masked array with a fill_value of ?, while cbook.safe_masked_invalid just returns an identical array.

If the array was float it would probably work though, but I have no idea where it is becoming dtype=object. Maybe related to datetime support?

(Not really clear to me why nanmin and nanmax is worse than adding masked conversions.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there is necessarily anything wrong with it if we are expecting nan's for some reason. To me, this just seemed like it was potentially hiding something else, which as you've noted here this only happens for dtype=object. If you are passing in a float array, I really don't think we should be making it an object array 😟 Perhaps this is because these error bars are generally used for categoricals so something is converting to objects fast because of that?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do this because we don't want to drop units.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine for the purpose of getting this to work.

@jklymak jklymak requested a review from greglucas May 24, 2022 11:00
@greglucas greglucas merged commit ca4958b into matplotlib:main May 24, 2022
@oscargus oscargus deleted the nanbarlabel branch May 24, 2022 14:24
@QuLogic QuLogic added this to the v3.6.0 milestone May 24, 2022
@pllim
Copy link

pllim commented May 25, 2022

Hello! A test in astropy started failing with matplotlib dev; it has something to do with errorbar, so I am suspecting it might be caused by this PR. Can someone please have a look at astropy/astropy#13276 and advise? Thanks!

@greglucas
Copy link
Contributor

I think this is on us @pllim

Looks like it is coming with this error:
TypeError: Cannot store non-quantity output from less function in <class 'astropy.units.quantity.Quantity'> instance
which I think is due to specifying the out here:

            res = np.zeros_like(err, dtype=bool)  # Default in case of nan
            if np.any(np.less(err, -err, out=res, where=(err == err))):

@oscargus, I wonder if the simple fix is to go to np.zeros(err.shape, dtype=bool) instead of like?

Although, I am wondering if this is going to hit other interesting cases too and we should back this commit out until we can solve the root of the issue with the units stuff.

@jklymak
Copy link
Member

jklymak commented May 26, 2022

I think unit conversion has to happen earlier here. We are trying to not do it until draw time because these are Line2d objects, but that is too hard to thread through for compound types.

@jklymak
Copy link
Member

jklymak commented May 26, 2022

Btw maybe let's open a new issue. We now have a Quantity class in testing that can be referenced for the error without recourse to astropy

@oscargus
Copy link
Member Author

np.zeros(err.shape, dtype=bool) instead of like?

This seems to do it. One would think (at least I did) that zeros_like(a, dtype=bool) was just a wrapper for zeros(a.shape, dtype=bool), but apparently not...

Or rather, there seems to be an issue with either numpy or astropy here (but we can make sure it doesn't affect us, I'll provide a PR). Providing the dtype argument to zeros_like says

dtype data-type, optional
    Overrides the data type of the result.
    New in version 1.6.0.

although if the argument is a Quantity-vector it will return a Quantity-vector.

I've opened a bug at numpy/numpy#21603 to avoid that more people fall in the same trap.

(This reminds me that it would be nice to have a test suite with optional dependencies that runs every week or so. Although in this case we got the error back earlier than that.)

@jklymak our Quantity class is not evolved enough to provide this problematic behavior. Running zeros_like with an instance of that gives an nparray, not a Quantity-vector.

@pllim
Copy link

pllim commented May 26, 2022

Thank you for the quick response!

Re: Quantity -- FWIW the full implementation is at https://github.com/astropy/astropy/blob/f0e2129aad460ea0db00730f10f98fe799aefb2f/astropy/units/quantity.py#L239

Re: Testing -- We test against a few dev versions of upstream in CI (for better or worse), so that is how we caught this quickly. We are also trying to figure out how to test against our own downstream packages as well. @astrofrog is putting something together at https://github.com/astropy/integration-testing (not deployed yet) if you are interested.

@jklymak
Copy link
Member

jklymak commented May 26, 2022

Yes, thanks for testing our dev build.

Does astropy test that changing axis units works? Or do you just need the first-pass conversion?

@pllim
Copy link

pllim commented May 26, 2022

Does astropy test that changing axis units works? Or do you just need the first-pass conversion?

Does this count as unit conversion? The test here plots one thing in one unit, and then a second thing in another compatible unit:

https://github.com/astropy/astropy/blob/f0e2129aad460ea0db00730f10f98fe799aefb2f/astropy/visualization/tests/test_units.py#L22-L38

@jklymak
Copy link
Member

jklymak commented May 26, 2022

@pllim Well that counts as first-pass conversion. i.e. both could be converted at the start of their method. But you never use axis.set_units() to change the units after the fact? That's what is advertised to work, and works for plot but doesn't work for anything else.

@pllim
Copy link

pllim commented May 26, 2022

I cannot find any .set_units() usage in the test suite or docs. Let me ping the astropy.visualization maintainers about this: @larrybradley @Cadair @astrofrog

@greglucas
Copy link
Contributor

One would think (at least I did) that zeros_like(a, dtype=bool) was just a wrapper for zeros(a.shape, dtype=bool), but apparently not...

I can see the confusion! zeros_like refers to the class/object type, and dtype in the constructor to the datatype. So, we have a Quantity class that can have many different datatypes, or a MaskedArray class with multiple datatypes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: bar_label fails with nan errorbar values
5 participants