-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Response to Feature Request: draw percentiles in violinplot #8532 #8585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
widths=widths, showmeans=showmeans, | ||
showextrema=showextrema, showmedians=showmedians) | ||
|
||
def violin(self, vpstats, positions=None, vert=True, widths=0.5, | ||
def violin(self, vpstats, custom_stat_vals, custom_stat_alias_list, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are the custom stat-related values required parameters now? They should be be optional, IMO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, they should be optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many more thoughts on this, but here are some comments to get things started.
@@ -88,6 +88,26 @@ def _plot_args_replacer(args, data): | |||
"multiple plotting calls instead.") | |||
|
|||
|
|||
class ViolinStatFunc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification: maybe I should not put this class in this file. I meant for it to be public so that user can easily enter input. They have more than just the callable: e.g. percentile requires 1) an array of data and 2) and int. While the array data is drawn from main input , we need to put the integer argument in this object. Furthermore, each function drawn on the violin has to have a corresponding artist object returned by the function _Axes.violin() so we need an "alias" there as well. While each function input alias should be unique, I tried to be defensive by dealing with duplicating alias as discussed below. In short, this is a wrapper object that specifies everything that a callable to be drawn on the violin should know (apart from the array data input, which is drawn elsewhere). Happy to discuss/modify more if required.
3) a list of additional arguments. This list does not contain the | ||
aforementioned compulsory 1-d list of data. | ||
""" | ||
def __init__(self, func_callable, **kargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using "kwargs" instead of "kargs" would be more consistent with the rest of the MPL code base
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted
self.func_callable = func_callable | ||
self.alias = kargs.pop('alias', func_callable.__name__) | ||
self.optional_args = kargs.pop('args', []) | ||
if not isinstance(self.optional_args, list): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any iterable should be fine. Or, at the very least, tuples should be valid as well.
So something like:
if np.isscalar(self.optional_args):
self.optional_args = [self.optional_args]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted.
for func_obj in func_obj_list: | ||
while func_obj.alias in unique_alias_set: | ||
func_obj.alias += 'x' | ||
unique_alias_set.add(func_obj.alias) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to return unique_alias_set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No. unique_alias_set keeps track of all alias so far. The actual alias (modified so that they are unique) is in the ViolinStatFunc object itself. I think this defensive programming mechanism helps safe-guarding against user inputs where multiple input functions have repeating aliases. These aliases later on become the keys in the dict of artist object returned by violin function. While never experimented with it myself, I assume these artist object allows user to modify colors, shape and other visual effect of the line drawn on the violin. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unique_alias_set
isn't used or defined anywhere else, so I'm not sure why we even need the aliases. Why can't the user just pass a list of functions that accept an array as input and returns an array or scalar, and draw lines at each value? For instance, the user would provide:
from functools import partial
stat_fxn = [partial(np.percentile, q=[5, 25, 50, 75, 95])]
# OR
stat_fxns = [np.mean, np.median, partial(np.percentile, q=95)]
## Then somewhere in the new code, we'd do something like:
results = np.flatten([stat(data) for stat in stat_fxns]) # not sure that flatten is the right function
should work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general, I guess I'm saying I'd like to see an API that let's the users work with familiar functions, rather than having to the learn the nuances of a new class
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. I see where you are coming from. Will remove this new class. I'm only worried about the very final output i.e. the artist dictionary output of _Axes.violin()
. Where I'm coming from is that the _Axes.violin()
outputs a dictionary of artist, and we need unique dictionary keys to represent the artist corresponding to the custom statistics function.
These artist objects allow user to change color, shape of the line etc... and the dictionary requires a unique key i.e. my current implemented alias. We have three ways of going around this:
- (non-user friendly) make user input a list of dictionary where each dictionary has a callable and an alias and constraint that all aliases in the list of dict are unique
- (slightly better) make user input the same list of dictionary but alias doesn't need to be unique, we modify the duplicating alias internally (as I did above with the
resolve_duplicate_aliases
and let them know we modify some aliases. - (probably fits where you are going with) don't ask for alias at all, and we make our own alias internally in the order that user input, but we still have to let them know the aliases.
Let me know what you think. I'm open for suggestions.
|
||
violin_stat_func_obj_list = [] | ||
for func in statistics_function_list: | ||
if not isinstance(func, ViolinStatFunc): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if _ViolinStatFunc
is private, then we can only deal with callables and simplify this logic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed above.
temporary close this PR to improve on the code |
PR Summary
This pull request addresses feature request #8532 to draw percentile on violin plot. However, while implementing, I believe there is no reason to restrict violin-plot to fixed number of summary statistics. Therefore, I implemented a very general extension where user can potentially pass any number of any custom statistics function to be rendered on the violin plot. Examples (used in testing) including percentiles, mean+/- standard deviation. The unit tests included in
test_axes.py
shows some potential usage for the new extension. More testings and/or documentation (or examples) can be provided if required.#8532
PR Checklist