Skip to content

Response to Feature Request: draw percentiles in violinplot #8532 #8585

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Response to Feature Request: draw percentiles in violinplot #8532 #8585

wants to merge 5 commits into from

Conversation

ghost
Copy link

@ghost ghost commented May 7, 2017

PR Summary

This pull request addresses feature request #8532 to draw percentile on violin plot. However, while implementing, I believe there is no reason to restrict violin-plot to fixed number of summary statistics. Therefore, I implemented a very general extension where user can potentially pass any number of any custom statistics function to be rendered on the violin plot. Examples (used in testing) including percentiles, mean+/- standard deviation. The unit tests included in test_axes.py shows some potential usage for the new extension. More testings and/or documentation (or examples) can be provided if required.

#8532

PR Checklist

  • Has Pytest style unit tests
  • Code is PEP 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/whats_new.rst if major new feature
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@ghost ghost changed the title Violin custom stat Feature Request: draw percentiles in violinplot #8532 May 7, 2017
@ghost ghost changed the title Feature Request: draw percentiles in violinplot #8532 Response to Feature Request: draw percentiles in violinplot #8532 May 7, 2017
widths=widths, showmeans=showmeans,
showextrema=showextrema, showmedians=showmedians)

def violin(self, vpstats, positions=None, vert=True, widths=0.5,
def violin(self, vpstats, custom_stat_vals, custom_stat_alias_list,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the custom stat-related values required parameters now? They should be be optional, IMO

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, they should be optional

Copy link
Member

@phobson phobson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many more thoughts on this, but here are some comments to get things started.

@@ -88,6 +88,26 @@ def _plot_args_replacer(args, data):
"multiple plotting calls instead.")


class ViolinStatFunc:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should be private

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clarification: maybe I should not put this class in this file. I meant for it to be public so that user can easily enter input. They have more than just the callable: e.g. percentile requires 1) an array of data and 2) and int. While the array data is drawn from main input , we need to put the integer argument in this object. Furthermore, each function drawn on the violin has to have a corresponding artist object returned by the function _Axes.violin() so we need an "alias" there as well. While each function input alias should be unique, I tried to be defensive by dealing with duplicating alias as discussed below. In short, this is a wrapper object that specifies everything that a callable to be drawn on the violin should know (apart from the array data input, which is drawn elsewhere). Happy to discuss/modify more if required.

3) a list of additional arguments. This list does not contain the
aforementioned compulsory 1-d list of data.
"""
def __init__(self, func_callable, **kargs):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using "kwargs" instead of "kargs" would be more consistent with the rest of the MPL code base

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted

self.func_callable = func_callable
self.alias = kargs.pop('alias', func_callable.__name__)
self.optional_args = kargs.pop('args', [])
if not isinstance(self.optional_args, list):
Copy link
Member

@phobson phobson May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any iterable should be fine. Or, at the very least, tuples should be valid as well.

So something like:

if np.isscalar(self.optional_args):
    self.optional_args = [self.optional_args]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted.

for func_obj in func_obj_list:
while func_obj.alias in unique_alias_set:
func_obj.alias += 'x'
unique_alias_set.add(func_obj.alias)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this need to return unique_alias_set?

Copy link
Author

@ghost ghost May 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. unique_alias_set keeps track of all alias so far. The actual alias (modified so that they are unique) is in the ViolinStatFunc object itself. I think this defensive programming mechanism helps safe-guarding against user inputs where multiple input functions have repeating aliases. These aliases later on become the keys in the dict of artist object returned by violin function. While never experimented with it myself, I assume these artist object allows user to modify colors, shape and other visual effect of the line drawn on the violin. Thoughts?

Copy link
Member

@phobson phobson May 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unique_alias_set isn't used or defined anywhere else, so I'm not sure why we even need the aliases. Why can't the user just pass a list of functions that accept an array as input and returns an array or scalar, and draw lines at each value? For instance, the user would provide:

from functools import partial
stat_fxn = [partial(np.percentile, q=[5, 25, 50, 75, 95])]
# OR
stat_fxns = [np.mean, np.median, partial(np.percentile, q=95)]

## Then somewhere in the new code, we'd do something like:
results = np.flatten([stat(data) for stat in stat_fxns])  # not sure that flatten is the right function

should work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, I guess I'm saying I'd like to see an API that let's the users work with familiar functions, rather than having to the learn the nuances of a new class

Copy link
Author

@ghost ghost May 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah. I see where you are coming from. Will remove this new class. I'm only worried about the very final output i.e. the artist dictionary output of _Axes.violin(). Where I'm coming from is that the _Axes.violin() outputs a dictionary of artist, and we need unique dictionary keys to represent the artist corresponding to the custom statistics function.
These artist objects allow user to change color, shape of the line etc... and the dictionary requires a unique key i.e. my current implemented alias. We have three ways of going around this:

  1. (non-user friendly) make user input a list of dictionary where each dictionary has a callable and an alias and constraint that all aliases in the list of dict are unique
  2. (slightly better) make user input the same list of dictionary but alias doesn't need to be unique, we modify the duplicating alias internally (as I did above with the resolve_duplicate_aliases and let them know we modify some aliases.
  3. (probably fits where you are going with) don't ask for alias at all, and we make our own alias internally in the order that user input, but we still have to let them know the aliases.
    Let me know what you think. I'm open for suggestions.


violin_stat_func_obj_list = []
for func in statistics_function_list:
if not isinstance(func, ViolinStatFunc):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if _ViolinStatFunc is private, then we can only deal with callables and simplify this logic.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed above.

@ghost
Copy link
Author

ghost commented May 19, 2017

temporary close this PR to improve on the code

@ghost ghost closed this May 19, 2017
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant