Skip to content

MNT: fix __array__ to numpy #22975

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 18, 2022

Conversation

jklymak
Copy link
Member

@jklymak jklymak commented May 4, 2022

PR Summary

Second attempt: closes #22973

Arrays sometimes don't have all the methods arrays should have, so
add another check here. Plot requires both ndim and shape and this
will extract the numpy array if x does not have those attributes.
Otherwise leave the object alone, because unit support (currently only
in plot) requires the object to retain the unit info.

PR Checklist

Tests and Styling

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@jklymak
Copy link
Member Author

jklymak commented May 4, 2022

This fails the unit handling...

@mhvk
Copy link
Contributor

mhvk commented May 4, 2022

That's tricky: so it seems that the Quantity class being tested is not an ndarray, but tries to behave similarly so it can be used as an array. But it also has an __array__ method as well which converts it to a plain ndarray. I guess the problem is that the presence of __array__ can mean both "I'm not an array, so if you want an array, use this" or "I'm an array duck type, but if you want a real ndarray, use this". I think ideally this whole routine would be something like

return x if isinstance(x, (np.ndarray, ndarray_ducktype)) else np.asarray(x)

But the real question is how to determine whether something is an ndarray_ducktype...

@jklymak jklymak force-pushed the mnt-fix-array-numpy-second-try branch 2 times, most recently from 085b1f9 to 6ccd2c1 Compare May 5, 2022 04:38
@@ -1333,7 +1333,9 @@ def _check_1d(x):
"""Convert scalars to 1D arrays; pass-through arrays as is."""
# Unpack in case of e.g. Pandas or xarray object
x = _unpack_to_numpy(x)
if not hasattr(x, 'shape') or len(x.shape) < 1:
if (not hasattr(x, 'shape') or
not hasattr(x, 'ndim') or
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to fix the problem. and see the test. However I am not sure if this is really on us, or the Kernel, which is missing ndim. Or maybe we should not be relying on ndim.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is ndim always len(shape)? (I guess not, but otherwise it make make sense to just use that instead?)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are 120 ndim in the Python code base (although some are in tests or non-Python-code), so even though we probably should fix it here, in the long run it may be better that Kernel gets ndim.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that Kernel does not really properly behave like an ndarray, so it does need to be converted to an array here. We're (ab)using the fact that it does not define ndim as a proxy for "I'm not a proper duck type, just happen to have shape". Really one would like the np.asduckarray that has been discussed in numpy...

@timhoffm
Copy link
Member

timhoffm commented May 5, 2022

But the real question is how to determine whether something is an ndarray_ducktype ...

That cannot be answered generally. Duck-typing is an operational concept: If it provides the functions I need, it's a duck-type to me. Duck types are never perfect (at the very least they'd fail isinstance() behavior), but if I only need objects that quack like a duck I don't care whether it walks like a duck as well.

The only thing we could do is hard-code a list of known good-enough duck-types. Otherwise we'd have to query the object upfront for all the behavior we want which is practially impossible.
A third theoretical option would be to fully embrace ducktyping: Try with the object as is, if that fails convert the object to a numpy array and retry. However, that can lead to performance issues and more importantly our code is not designed for retries, so we might get into consistency issues if the first try partially modifies some state.

@jklymak
Copy link
Member Author

jklymak commented May 6, 2022

Just writing as I sort this out in my head.

Most of our methods do something like

        x, y = self._process_unit_info([("x", x), ("y", y)], kwargs)
        x = np.ma.ravel(x)  # or alternately np.asarray(x)
        y = np.ma.ravel(y)

by which point we have set the units, and x and y are now type ndarray.

The ax.plot method, on the other hand, needs to unpack x and y into individual arguments so we do:

        x = _check_1d(xy[0])
        y = _check_1d(xy[1])
        self.axes.xaxis.update_units(x)
        self.axes.yaxis.update_units(y)

However, we do not want to lose units yet, and indeed we do not, at least according to test_units.py::test_numpy_facade, which claims we have the ability to transform units at any time with ax.xaxis.set_units. However, note that ax.xaxis.set_units absolutely fails with scatter (see #9713 (comment)) so dragging around the unit information only works for plot currently.

Hence we are in a bit of a Gordian knot of compatibility. We have enshrined pint as something we support, but only for plot apparently. We allow __array__ but currently depend on it having ndim and shape. If an array doesn't have shape then we np.atleast_1d it. I suppose we can do the same thing to arrays without ndim.

@jklymak jklymak marked this pull request as ready for review May 6, 2022 21:10
@jklymak
Copy link
Member Author

jklymak commented May 6, 2022

This works, but I really feel this is all a mess. Hopefully one of our RSE's can work on this...

@jklymak jklymak added this to the v3.5.3 milestone May 6, 2022
@jklymak jklymak added the Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. label May 6, 2022
Arrays sometimes don't have all the methods arrays should have, so
add another check here.  Plot requires both ndim and shape and this
will extract the numpy array if x does not have those attributes.
Otherwise leave the object alone, because unit support (currently only
in plot) requires the object to retain the unit info.
@jklymak jklymak force-pushed the mnt-fix-array-numpy-second-try branch from e2da0f9 to 29ce2c2 Compare May 6, 2022 21:14
@mhvk
Copy link
Contributor

mhvk commented May 6, 2022

@jklymak - indeed, also for astropy unit support works really well for plots, but not for images (cannot give units for an extent). Would be nice if that could also be exchanged. I think for the particular bug that triggered all this, we can also adjust on our side -- it is not something used all the time.

@jklymak
Copy link
Member Author

jklymak commented May 6, 2022

Just to clarify my comment, again while I am thinking about it - ideally plot would do the same thing as the other methods and convert to numpy immediately after the units have been determined. But currently we cannot because plot (and only plot) tries to preserve unit information.

In my mind what the proper way to do this would be to

  1. determine units of the object and pass to the axis
  2. use the units to covert the unit-full object to numpy into some base unit for the axis (so if the converter allows 'feet', 'miles', and 'meters', convert to floats in meters).
  3. Allow the axis to scale its data in the units that the user wants (so if they specify 'feet', add a transform from meters to feet to the displayed data, but keep the underlying data in meters).

This way Matplotlib wouldn't need to drag around the extra units internally, it would just have a transform on each axis.

@mhvk
Copy link
Contributor

mhvk commented May 6, 2022

yes, that makes sense

@dopplershift
Copy link
Contributor

I think the reason it's currently set up this way with Pint and plot() is because plot() is the only one I've personally needed to adjust axis units--whereas most of my use of other plot units is spatially, where units only matter for the z/image axis (e.g. imshow, contour).

I'm not sure how you can tweak the current conversion interface without breaking backwards compatibility...

@jklymak
Copy link
Member Author

jklymak commented May 7, 2022

I think my proposal would keep the same public user interface but change the internal representation of data passed in w units.

I think it could also be done so that most features work for downstream libraries the way they do now for scatter (not able to change axis units properly) but to have the plot-like full feature downstream libraries would need to add more logic to their locators and formatters.

@jklymak
Copy link
Member Author

jklymak commented May 8, 2022

See #23015 for more discussion of the convert-immediately strategy for dealing with units.

@jklymak
Copy link
Member Author

jklymak commented May 18, 2022

Despite the discussion above, I think this should go in as-is for 3.5.3....

Copy link
Member

@oscargus oscargus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should indeed go in for 3.5.3 to fix the regression.

@timhoffm timhoffm merged commit da84e38 into matplotlib:main May 18, 2022
meeseeksmachine pushed a commit to meeseeksmachine/matplotlib that referenced this pull request May 18, 2022
@jklymak jklymak deleted the mnt-fix-array-numpy-second-try branch May 18, 2022 12:12
QuLogic added a commit that referenced this pull request May 19, 2022
…975-on-v3.5.x

Backport PR #22975 on branch v3.5.x (MNT: fix __array__ to numpy)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions. topic: units and array ducktypes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: v3.5.2 causing plot to crash when plotting object with __array__ method
6 participants