-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
FIX: clean up unit conversion unpacking of data, particularly for dates and pandas series #11664
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
12b3b7f
to
445047c
Compare
lib/matplotlib/axes/_axes.py
Outdated
@@ -3644,6 +3644,8 @@ def dopatch(xs, ys, **kwargs): | |||
datashape_message = ("List of boxplot statistics and `{0}` " | |||
"values must have same the length") | |||
# check position | |||
if hasattr(positions, 'values'): | |||
positions = positions.values |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any better ways to check if something is a pandas Series
?
In this case the following (admittedly very constructed) example would fail:
import numpy as np
import matplotlib.pyplot as plt
class Years(list):
values = True
data = np.random.rand(5,2)
years = Years([1999,2003])
plt.figure()
plt.boxplot(data, positions=years)
plt.show()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, its a damned if you do, damned if you don't situation.
Other data types use the values
convention as well (xarray for instance). The goal here was to unpack the pandas series w/o having to know what a pandas series is (there is no pandas import in the main library).
If someone tries to pass a class that has values
meaning something else, then I guess they will have to unpack before they plot.
FWIW, my preference would be that we only support numpy arrays and basic python lists, and let users and/or downstream libraries do their own unpacking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't like the idea at all that only because some object has a values
attribute, matplotlib now thinks that it should use it (and not the actual data in the object, e.g. provided by some iterator).
Or, if you think this is what should always happen, it should actually always happen. As it stands, matplotlib would use the iterator for usual numbers, but the values
attribute for dates (#10638) or in boxplots (this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, agreed. Stand by for a more holistic solution - though potentially more controversial...
445047c
to
fb7eb0b
Compare
b84c4b5
to
95e4e52
Compare
d87ac91
to
db394cb
Compare
ping @jorisvandenbossche as this is a redo of a PR you helped with previously... |
Maybe closes #5896 |
db394cb
to
b16c311
Compare
Rebased, this is still ready to go and waiting for review. |
Looks okish, but I'm not deep enough into the numpy/pandas data formats to judge the implications. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just a very minor change and some explanation or additional comments.
@@ -6048,7 +6048,7 @@ def pcolormesh(self, *args, alpha=None, norm=None, cmap=None, vmin=None, | |||
|
|||
# convert to one dimensional arrays | |||
C = C.ravel() | |||
coords = np.column_stack((X, Y)).astype(float, copy=False) | |||
coords = np.column_stack((X, Y)).astype(float) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see any point in removing the "copy=False" kwarg. It saves time, and it is logically what is wanted here. A copy was already made via the column_stack
call, so nothing done to coords will affect X or Y.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure. Changed back
if munits.ConversionInterface.is_numlike(x): | ||
return x | ||
if isinstance(x, list) or isinstance(x, numbers.Number): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the rationale for returning a list as-is instead of running it through np.asarray
? Just that the latter is unnecessary? Or are there cases where the conversion would be an error?
return x | ||
if isinstance(x, list) or isinstance(x, numbers.Number): | ||
return x | ||
if issubclass(type(x), np.ma.MaskedArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this using issubclass(type(x)...
instead of isinstance
?
return x | ||
if issubclass(type(x), np.ma.MaskedArray): | ||
return np.ma.asarray(x) | ||
return np.asarray(x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the point of this whole addition above to short-circuit some of what would otherwise happen in get_converter
? I'm not opposed, just confused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems worse, because calling np.asarray
can drop information in some cases, making it unavailable in the call to convert
below. Unless I'm missing something...
if ((isinstance(d, np.ndarray) and | ||
np.issubdtype(d.dtype, np.datetime64)) or | ||
isinstance(d, np.datetime64)): | ||
return _dt64_to_ordinalf(d) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What case is this whole block covering? Again, I'm confused.
data = np.random.rand(5, 2) | ||
years = pd.date_range('1/1/2000', | ||
periods=2, freq=pd.DateOffset(years=1)).year | ||
# Does not work |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another place I am confused. Is the comment obsolete?
@@ -1477,9 +1479,14 @@ def have_units(self): | |||
return self.converter is not None or self.units is not None | |||
|
|||
def convert_units(self, x): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be converting anything numlike, which includes things like pint.Quantity
, into an array/masked array before calling convert
. That seems like it would break pint
, though I will note that our unit tests seem to be passing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, I’ll check vs pint when I look at the other comments above. Thanks for the reminder...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess I still don't understand how all of this works. 😉 So is_numlike
only returns true
for things that are basic numbers (or contain them) and don't need converting. I think I'm ok with this code then. Tests passing is always a good thing too. 😄
OK, somewhat confused about everything I was thinking here, and some of the type checking has moved on since this was written. So I'm going to close. Note #13304 deregisters the pandas converters which I think was an important contribution here. Sorry for the reviewer effort on this... |
PR Summary
Closes #10022 and #11391
This does one small thing (run units conversion on the position argument of
bxp/boxplot
) and two big things:np.asarray
on incoming objects to unpack numpy arrays from them. Previously in order to get native pandas support we were unpacking thevalues
field, but runningnp.asarray
does the same thing and allows the packed values to not necessarily be stored invalues
. i.e.returns:
This re-does #10638, which unpacked pandas py checking for and using
values
.date
handling should be able to strip pandas series of their date information and convert. This passes all the pandas tests, but I admit this may be a controversial step.Immediate fix code:
failed on master. This fixes...
PR Checklist