-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Enh mappable remapper #4490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enh mappable remapper #4490
Conversation
Draft of decorators to automatically map DataFrame columns to base objects for plotting.
@jenshnielsen You are a hero for sorting out all of the doc related formatting issues. |
@matplotlib/developers I would like to talk about this in my scipy talk, any feed back would be appreciated. |
|
||
|
||
def apply_args_mapping(ax, func, data, map_targets, *args, **kwargs): | ||
args = tuple(data[k] for k in map_targets) + args |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you using the values
attribute for the kwargs cases but not for the args cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oversight, but I am not fully sure which way is better. .values
gets us a numpy array full-stop, but just [] gets use a Series
which more-or-less behave like a numpy array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not using .values
also means that these decorators will work with dicts of numpy arrays. It might be better to use np.asarray
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omitting .values
also means it will work with a numpy structured array; but I'm not sure if there would be any point in this.
Maybe use np.asanyarray
in case something might yield a masked array?
I presume with a DataFrame, missing data will end up as NaN after conversion, correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is always NaN internally in pandas and they don't do masking/sparse at all
These decorators + #4488 should make it easy to wrap non-pandas aware functions to play nicely with their new |
Use `np.asarray` to ensure that the contents of the mapping are suitable to pass into mpl functions. This allows these functions to work with both DataFrames, with dict-likes of lists/arrays and (presumably) with xray objects.
Any chance of some tests? |
So requests for tests means people like this idea ? ;) On Thu, Jun 11, 2015 at 11:49 AM Phil Elson notifications@github.com
|
I have no idea, as I don't use pandas. The only concern I have comes from making mpl more difficult to contribute to, it adds an extra layer for developers to test... |
These should be stand-alone helper functions, I do not see these getting On Thu, Jun 11, 2015 at 12:03 PM OceanWolf notifications@github.com wrote:
|
Ha! Caught me. I've just finished reading that thread 30mins down...
My interest is piqued though - I've not spent 30 seconds thinking about it, but I'm not convinced that the best solution was found in the discussion. A scipy discussion in and of itself? |
Had a chat with @ellisonbg about how to do this, he is suggesting an API more like:
which is what other plotting packages which take labeled data use. The idea is that we can implement a decorator that goes through the args and tries to replace any strings with the colums of the data. This has the advantage of not requiring a hard pandas dependency and will work with things like hdf5/netcdf4 objects or dicts of arrays. I am not a huge fan of this is it just feels wrong that the (second) most important thing goes last. It also runs the risk of not doing this right due the constraints of not using the fact that we know the input is a datafram and providing a broken API is worse than not providing one. attn @efiring |
The @ellisonbg suggestion looks good to me, in the sense that I think it provides an interface that will make sense to users, and be easy and natural to use. I like the generality of allowing data to be anything that acts like a dict of arrays. Your "feels wrong" sensation is coming from having the |
Closing this as cute, but not useful. |
This came out of the discussion at pandas-dev/pandas#10129.
I think this will make the lives of pandas users much better (along with buckets and buckets of documentation).