Skip to content

Fix for plt.plot() does not support structured arrays as data= kwarg #10810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 23, 2018

Conversation

zhangeugenia
Copy link
Contributor

PR Summary

Fix for issue #8818 - plot() is now be able to plot structured arrays passed in as data=... kwarg without incorrectly interpreting the data as a format string.

A small simple test case is included with this PR to check that plotting a structured array will not cause and exception (without the use of image comparison since it felt unneeded).

If any changes/modifications are needed, please let me know. Thank you!

@zhangeugenia zhangeugenia force-pushed the bugfix-for-issue-8818 branch from 707ca97 to 73101a2 Compare March 16, 2018 09:48
Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works as advertised, and I don't see that its breaking anything...

efiring
efiring previously approved these changes Mar 16, 2018
Copy link
Member

@efiring efiring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@efiring
Copy link
Member

efiring commented Mar 16, 2018

@tacaswell, do you want to stick one of the version 2 milestones on this? It is addressing a bug milestoned for 2.2, and it looks like a clean and simple fix.

@tacaswell tacaswell added this to the v3.0 milestone Mar 16, 2018
@tacaswell
Copy link
Member

Inclined to not backport, this never worked (so not fixing a regression), is not a failure to import / segfault, and the user has a clear work around (just don't use the data kwarg with structured arrays).

The issue here is that structured arrays support __getitem__ but not __contains__ so this only checks in if the key is in the data if it also parses as a color? There will still be a slight behavior difference between structure arrays and other containers in that structured arrays will not get the warning in the case of a conflict and (I think) the color behaviour will win!

Does this change how things fail if you pass in keys that are not a color or in the container?

@efiring
Copy link
Member

efiring commented Mar 16, 2018

@tacaswell, thank you, I had not looked into the situation in enough detail. It sounds like the real solution for fully supporting structured arrays would be to make _preprocess_data substitute a minimal wrapper for a structured 'data' array that would supply the missing __contains__, or, to handle at least the plot case, to put in special-case handling of the 'in' test if 'data' is a structured array.

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

... as per comments bu @tacaswell, this looks to need more thought. However, if it breaks data kwarg handling and passes our current tests, it means we are missing tests on this functionality.

@tacaswell
Copy link
Member

Thanks for working on this @zhangeugenia . Despite my seemingly negative comments, I think this is definitely going in the right direction and appreciate your work!

@zhangeugenia
Copy link
Contributor Author

Hello! Thank you all for your time and feedback, I greatly appretiate it (especially since plot() is a very important function)!

As @tacaswell mentioned, this PR does indeed change the behaviour of _plot_args_replacer. In the PR, the function now depends on "if args[1] is a valid format string" determine the arguments (instead of depending on "if args[1] is in the container or not") when there are 2 arguments passed in. I am assuming that it would be best to keep the original behaviour to keep things consistent, so the ordering/behaviour will be reverted back to how it was originally in the changes.

Out of the two methods that @efiring proposed to support structured arrays, the latter (to check the type of data, and invoke __getitem__ in place of __contains__ if data is a structured array in plot()) would be the quick and easy fix. Personally, I'm more inclined to the former (adding a wrapper class to a structured array to implement __contains__) since it could be reused in future development. After a bit of looking around the Numpy documentation I think that we can identify a structured array if it is a ndarray and it has dtype.fields, but I think I may need to look into the source code for the object in more detail.

I do have a question regarding testing though - once the support for structured arrays has been implemented, if I were to add tests to test _plot_args_replacer directly would it still go under test_axes.py? Currently, _plot_args_replacer seems to be lightly tested in test_preprocess_data.py.

Thank you!

@jklymak jklymak dismissed their stale review March 18, 2018 16:16

I'm not filly parsing the issue here, so will drop out of review. However, I can look into it if core devs too busy. Just ping issue again

@efiring efiring dismissed their stale review March 18, 2018 20:48

I'm expecting major changes.

@efiring
Copy link
Member

efiring commented Mar 18, 2018

@zhangeugenia I think I would put direct tests of _plot_args_replacer either in test_axes.py or, if they are extensive, in a new "test_plot_args_replacer.py".

Going the subclass route might not be a good idea--I suspect it would be overkill, and it is probably much trickier than one might expect. Is there likely to be any other place where it would help enough to justify the fuss?

All you need for option 1 is something like

if not (args[1] in data
        or hasattr(data, 'dtype') and args[1] in data.dtype.names):

To be even safer, in case something comes through with a 'dtype' that is not a standard numpy dtype, you could insert and hasattr(data.dtype, 'names') in the middle of the second line above.

@efiring
Copy link
Member

efiring commented Mar 18, 2018

Additional note: ndarrays, structured or not, do have a contains method, but it doesn't do what we need here.

@tacaswell
Copy link
Member

I like @efiring 's suggestion.

I am hesitant to add direct tests of _plot_args_replacer, it is better to test it through usage of the public API. I suspect that in the near future the decorator will be made python3 only and made public.

@zhangeugenia
Copy link
Contributor Author

Thank you for your guidance! Indeed, a wrapper class would be a lot of work, I just thought it would reduce issues with using in on stuctured arrays if any new features in the future required using in on data. It seems like using in on data is fairly uncommon, so the lack of use certainally would not justify the wrapper class in this case.

I will take @efiring 's suggestion and make the needed changes to _plot_args_replacer, and test it via a public method as @tacaswell suggested. Thank you so much once again!

@zhangeugenia zhangeugenia force-pushed the bugfix-for-issue-8818 branch 9 times, most recently from fb20640 to 73101a2 Compare March 22, 2018 05:13
@zhangeugenia
Copy link
Contributor Author

zhangeugenia commented Mar 23, 2018

Updated to use @efiring 's suggestion! A check for data.dtype.names was added into the conditional as well, since a non-structured numpy array would give data.dtype.names = None. Please let me know if there are any further changes needed/a squish needed, or if I should remove the test from test_axis. Thank you!

@tacaswell
Copy link
Member

Looks good to me!

@efiring efiring merged commit 5669261 into matplotlib:master Mar 23, 2018
@efiring
Copy link
Member

efiring commented Mar 23, 2018

Thank you, @zhangeugenia!

@@ -55,7 +55,11 @@ def _plot_args_replacer(args, data):
return ["y"]
elif len(args) == 2:
# this can be two cases: x,y or y,c
if not args[1] in data:
if (not args[1] in data and
not (hasattr(data, 'dtype') and
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This check is correct, but a bit hard to read. I would just try ... except. The cleanest way would be a little helper function:

def _has_name(data, name):
    try:
        return name in data or name in data.dtype.names
    except (AttributeError, TypeError):
        return False

and then

if _has_name(data, args[1]):
    return ["y", "c"]

@tacaswell
Copy link
Member

Thanks @zhangeugenia and congratulations on what is (I think?) your first merged Matplotlib PR 🎉 Hopefully we will hear from you again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants