Skip to content

ENH: plotting methods can unpack labeled data #4829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 63 commits into from
Sep 13, 2015

Conversation

jankatins
Copy link
Contributor

This is an alternative implementation of a labeled data decorator (see #4787)

Todo

  • add a whatsnew entry in doc/users/whats_new
  • address comments from the code reviews
  • plot() document the problems with 'color spec or next x' ambiguity when the third argument in *args is in data.
  • regenerate pyplot so that the decorated methods gain a data kwarg (or better get all reduced to plt.func(*args, **args) as thats the signature of the decorator.
  • Get someone to review the decorated functions and the code
  • make unittests for all decorated functions
  • get the unittests to run (mule isn't found?)
  • Check all decorated functions if the implicit label_namer="y" is ok. In most cases it probably is not...
  • Check all decorated functions with *args, **kwargs if it would be better to supply argument names
  • check all decorated functions if a rename_names list is better (i.e. don't replace everything but only a few arguments)
  • special case plot(...)
  • update docs with some information about the changed behaviour if data=data is passed in?
  • change label_namer="y" to label_namer=None? None is probably much more used than 'y'...
  • how to handle variable length *args positional_parameter_names={(#args: ["names"]}?

@jankatins
Copy link
Contributor Author

Note: this PR will fail as the decorated functions probably need some adjustments in the decorator call (see https://github.com/matplotlib/matplotlib/pull/4829/files#diff-0347a3e5b820bc05115b3c9cc6f122f9R49 -> test_compiletime_checks for the tested problems. It's mostly to do with not found label names)

@tacaswell
Copy link
Member

👍 Thanks! Either doing this my self and faking the commit author or asking you to do this was on my to-do list.

@jankatins jankatins force-pushed the unpack_labeled_data_alternative branch from c48414a to 9411f86 Compare July 30, 2015 20:35
@jankatins
Copy link
Contributor Author

slightly reordered (the .index commit is now first) and updated compile time check to hopefully get less travis failures... Also now includes a better commit message and a .gitignore update for pycharm

@@ -979,6 +981,7 @@ def hlines(self, y, xmin, xmax, colors='k', linestyles='solid',

return coll

@unpack_labeled_data()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this probably needs a label_namer="x"

@tacaswell tacaswell added this to the next point release milestone Jul 31, 2015
@jankatins
Copy link
Contributor Author

I started to go thought the decorated functions to add the right decorator arguments, but this is slow going, as in some case I don't know the plotting function at all :-(

@jankatins
Copy link
Contributor Author

Uih, and this is currently not working:

  quiver(U, V, **kw)
  quiver(U, V, C, **kw)
  quiver(X, Y, U, V, **kw)
  quiver(X, Y, U, V, C, **kw)

-> the names of the individual args depend on the total number of args

BTW: this will currently fail if you call the function with `quiver(U=[...], V, **kw)

@efiring
Copy link
Member

efiring commented Aug 2, 2015

Yes, these sorts of signatures date from early Matlab compatibility
considerations. There are several functions like this.

On 8/2/15, Jan Schulz notifications@github.com wrote:

Uih, and this is currently not working:

  quiver(U, V, **kw)
  quiver(U, V, C, **kw)
  quiver(X, Y, U, V, **kw)
  quiver(X, Y, U, V, C, **kw)

-> the names of the individual args depend on the total number of args


Reply to this email directly or view it on GitHub:
#4829 (comment)

@jankatins
Copy link
Contributor Author

Ok, I went through all the decorated function. The differences between the APIs for each function are kind of horrible :-( for quite a lot of the functions I had to remove the decorator again (just comemtned out for now), as they take data which cannot easily gotten from a pandas.DataFrame (aka 2d+ data -> df["name"] will return a 1d Series)

I also added a "replace_all_args" kwarg to the decorator, which handles the variable length *args. After seeing what is used here, I think that adding the possibility to pass in a dict as 'positional_parameter_names' which mals number_of_args -> ["list", "of", "names"] is probably the better solution...

@jankatins jankatins force-pushed the unpack_labeled_data_alternative branch from 47268fb to a71d16e Compare August 2, 2015 23:29
@tacaswell
Copy link
Member

Can you leave it in cases where a data frame would not work? There are other data structures (like h5py objects or a dict of ndarrays) which conform to the required signatures which can return 2D data. That feature is actually one of the points that did the most to convince me that this was worth doing.

@jankatins
Copy link
Contributor Author

In my understanding the main case for this is:

(df.pipe(h)
   .pipe((plt.plot, "data"), x="a", y="b" )
)

I wouldn't use the data=dict(...) case as one which should be optimized (or even catered) for because then we would need to keep the "replace all" (at least for isinstance(data, dict)), as a dict can also include scalars or ENUMs or whatever fancy argument a function can take. If you construct a dict it's IMO easier to use **dict(...) (or *list(...)). I'm also not sure how to tell the user that all in all the "normal" plot calls, a DataFrame works as data, but here you have to do some additional construction work.

Maybe one of the original proponents can chime in here: @jakevdp @fperez @mrocklin @ellisonbg @pzwang @mwaskom @jreback @andrewcollette

@tacaswell
Copy link
Member

If the point was primarily to support df.pipe, then the pandas folks should just implement pipe to be more flexible.

The protocol that was agreed on (to my understanding) at pydata Seattle was that data is any obotject that support __getitem__ with a string and returns something that np.asarray works on. From this point of view a dataframe is just a (restricted) dict of ndarrays and the base target implementation is isinstance(data, dict).

@JanSchulz I don't really understand your concern. If anything, you are making the case that we can go back to using a simpler decorator that tries to replace everything and make sure that the auto-labeler works (or just black list replacing label) as it adds some nice functionally is the simplest case and in cases where users pass custom objects.

@tacaswell
Copy link
Member

Sorry I am cranky, I should not leave comments before breakfast.

@blink1073
Copy link
Member

Witness @tacaswell, the grumpy plotting custodian.

@jankatins
Copy link
Contributor Author

@tacaswell No problem :-) I've no problem putting it back in, it's just my (biased?) impression that this functionality will be most used by pandas users (in whatever form: pipe or not) and I feel that making that case ("mostly") foolproof is a nice goal in terms of maintenance costs (stakoverflow Qs,...).

BTW: should this do even more magic and detect if we are called from pandas.DataFrame.pipe() and in that case accept the first argument as 'data'? would make it even cleaner for pandas users (no need for the ugly tuple: df.pipe((plt.plot, "data"), args) becomes df.pipe(plt.plot, args). CC: @TomAugspurger, @shoyer @jreback

@clarkfitzg
Copy link

Happy to see this going in. We've been adding plotting to xray recently. Most of our focus was on 2d data. Adding labels to the axes was important for us.

@jankatins
Copy link
Contributor Author

Ok, seems that we are down to pep8 problems. Yay :-)

@@ -2236,6 +2251,7 @@ def barh(self, bottom, width, height=0.8, left=None, **kwargs):
bottom=bottom, orientation='horizontal', **kwargs)
return patches

# @unpack_labeled_data() # not df["name"] getable...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be uncommented.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done, added with label_namer=None

tacaswell and others added 2 commits September 8, 2015 12:31
IPython ships a version of `Signature` and friends that runs on python
>=2.7.  Fall back to this version of things for python <=3.2.
MNT: use IPython's signature if needed + available
new_sig = None
ver_info = sys.version_info
# py2.6 compatible version, use line below as soon as we can
python_has_signature = ver_info[0] > 2 and ver_info[1] > 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor, but I keep tripping over this:

  • major and minor1 are already defined on line 194, so you don't need ver_info
  • These version checks are incorrect (python 4, anyone?) and confusing. They would be clearer as
python_has_signature = (major == 3 and minor1 >= 3)
python_has_wrapped = (major == 3 and minor1 >= 2)

Then you can remove all the comments in the vicinity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I didn't know we had those variables.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a PR into @JanSchulz 's branch to fix this and to guard against importing IPython in not already imported.

``pandas.DataFrames`` easier:

* For plotting methods which understand a ``label`` keyword argument but the
user does not supply such an argument, this is now implicitly set by either
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which of these two methods for looking up a label takes priority? It's not clear to me from the description here....

@shoyer shoyer mentioned this pull request Sep 11, 2015
@tacaswell
Copy link
Member

@matplotlib/developers I am planning to fix up the last few issues (comments from me, @pelson, and @shoyer ) and then merge this today.

@pelson We can fix up the testing in later PRs and I am hoping my claim that get_label is a place holder for future work is convincing.

@tacaswell
Copy link
Member

@JanSchulz I am happy to fix those as we go.

tacaswell added a commit that referenced this pull request Sep 13, 2015
ENH: plotting methods can unpack labeled data
@tacaswell tacaswell merged commit f8ceb85 into matplotlib:master Sep 13, 2015
macdems pushed a commit to macdems/matplotlib that referenced this pull request Sep 13, 2015
Arbitrary long args, i.e., plot("x","y","r","x2","y2","b") were problematic if
used with a data kwarg and the color spec ("r", "b") was included in data: this
made it impossible in some cases to determine what the user wanted to plot:

plot("x","y","r","x2","y2","b", data={..., "r":..., "b":...)

could be interpreted as

plot(x, y, "r") # all points red
plot(x2, y2, "b") # all points black

or

plot(x,y)
plot(r,x2)
plot(y2,b)

This could lead to hard to debug problems if both styles result in a similar plot
(e.g. if all are values in the same range). Therefore it was decided to remove
this possibility so that the usere gets a proper error message instead:

matplotlib#4829 (comment)

There is still a case of ambiguity (plot("y", "ro", data={"y":..., "ro":...),
which is now detected and a warning is issued. This detection could theoretically
be used to detect the above case as well, but there were so many corner cases,
that the checks became too horrible and I took that out again.

Note that passing in data directly (without a data kwarg) is unaffected, it still
accepts arbitrary long args.
macdems pushed a commit to macdems/matplotlib that referenced this pull request Sep 13, 2015
Arbitrary long args, i.e., plot("x","y","r","x2","y2","b") were problematic if
used with a data kwarg and the color spec ("r", "b") was included in data: this
made it impossible in some cases to determine what the user wanted to plot:

plot("x","y","r","x2","y2","b", data={..., "r":..., "b":...)

could be interpreted as

plot(x, y, "r") # all points red
plot(x2, y2, "b") # all points black

or

plot(x,y)
plot(r,x2)
plot(y2,b)

This could lead to hard to debug problems if both styles result in a similar plot
(e.g. if all are values in the same range). Therefore it was decided to remove
this possibility so that the usere gets a proper error message instead:

matplotlib#4829 (comment)

There is still a case of ambiguity (plot("y", "ro", data={"y":..., "ro":...),
which is now detected and a warning is issued. This detection could theoretically
be used to detect the above case as well, but there were so many corner cases,
that the checks became too horrible and I took that out again.

Note that passing in data directly (without a data kwarg) is unaffected, it still
accepts arbitrary long args.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Release critical For bugs that make the library unusable (segfaults, incorrect plots, etc) and major regressions.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants