-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Response to Feature Request: draw percentiles in violinplot #8532 #8585
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,6 +88,26 @@ def _plot_args_replacer(args, data): | |
"multiple plotting calls instead.") | ||
|
||
|
||
class ViolinStatFunc: | ||
""" | ||
The :class:`ViolinStatFunc` contains: | ||
1) a callable whose first argument is compulsory and is a 1-d list of data | ||
that is used to plot the violin. This first argument is not required to be | ||
specified. | ||
2) an alias for this callable. When violinplot outputs the dictionary of | ||
artists, this alias is used to identify the artist object corresponding to | ||
this callable | ||
3) a list of additional arguments. This list does not contain the | ||
aforementioned compulsory 1-d list of data. | ||
""" | ||
def __init__(self, func_callable, **kargs): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. using "kwargs" instead of "kargs" would be more consistent with the rest of the MPL code base There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted |
||
self.func_callable = func_callable | ||
self.alias = kargs.pop('alias', func_callable.__name__) | ||
self.optional_args = kargs.pop('args', []) | ||
if not isinstance(self.optional_args, list): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Any iterable should be fine. Or, at the very least, tuples should be valid as well. So something like: if np.isscalar(self.optional_args):
self.optional_args = [self.optional_args] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Noted. |
||
raise ValueError('args has to be a list') | ||
|
||
|
||
# The axes module contains all the wrappers to plotting functions. | ||
# All the other methods should go in the _AxesBase class. | ||
|
||
|
@@ -7277,7 +7297,7 @@ def matshow(self, Z, **kwargs): | |
@_preprocess_data(replace_names=["dataset"], label_namer=None) | ||
def violinplot(self, dataset, positions=None, vert=True, widths=0.5, | ||
showmeans=False, showextrema=True, showmedians=False, | ||
points=100, bw_method=None): | ||
points=100, bw_method=None, statistics_function_list=[]): | ||
""" | ||
Make a violin plot. | ||
|
||
|
@@ -7324,6 +7344,13 @@ def violinplot(self, dataset, positions=None, vert=True, widths=0.5, | |
callable, it should take a `GaussianKDE` instance as its only | ||
parameter and return a scalar. If None (default), 'scott' is used. | ||
|
||
statistics_function_list: a list of callable or ViolinStatFunc. The | ||
element of this list can be any custom summary statistics to be | ||
displayed on the voilin plot (with one constraint that the first | ||
argument of these function has to be the input data of violin plot | ||
i.e. dataset if dataset is 1-d or an element of dataset if dataset is | ||
2-d) | ||
|
||
Returns | ||
------- | ||
|
||
|
@@ -7369,12 +7396,44 @@ def _kde_method(X, coords): | |
kde = mlab.GaussianKDE(X, bw_method) | ||
return kde.evaluate(coords) | ||
|
||
vpstats = cbook.violin_stats(dataset, _kde_method, points=points) | ||
return self.violin(vpstats, positions=positions, vert=vert, | ||
def _resolve_duplicate_alias(func_obj_list): | ||
unique_alias_set = set() | ||
for func_obj in func_obj_list: | ||
while func_obj.alias in unique_alias_set: | ||
func_obj.alias += 'x' | ||
unique_alias_set.add(func_obj.alias) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. does this need to return There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No. unique_alias_set keeps track of all alias so far. The actual alias (modified so that they are unique) is in the ViolinStatFunc object itself. I think this defensive programming mechanism helps safe-guarding against user inputs where multiple input functions have repeating aliases. These aliases later on become the keys in the dict of artist object returned by violin function. While never experimented with it myself, I assume these artist object allows user to modify colors, shape and other visual effect of the line drawn on the violin. Thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
from functools import partial
stat_fxn = [partial(np.percentile, q=[5, 25, 50, 75, 95])]
# OR
stat_fxns = [np.mean, np.median, partial(np.percentile, q=95)]
## Then somewhere in the new code, we'd do something like:
results = np.flatten([stat(data) for stat in stat_fxns]) # not sure that flatten is the right function should work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In general, I guess I'm saying I'd like to see an API that let's the users work with familiar functions, rather than having to the learn the nuances of a new class There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah. I see where you are coming from. Will remove this new class. I'm only worried about the very final output i.e. the artist dictionary output of
|
||
|
||
violin_stat_func_obj_list = [] | ||
for func in statistics_function_list: | ||
if not isinstance(func, ViolinStatFunc): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. if There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discussed above. |
||
if callable(func): | ||
violin_stat_func_obj_list.append(ViolinStatFunc(func)) | ||
else: | ||
raise ValueError( | ||
'Optional argument has to be a callable' + | ||
'or a ViolinStatFunc object') | ||
else: | ||
violin_stat_func_obj_list.append(func) | ||
|
||
custom_stat_alias_list = [func_obj.alias for func_obj in | ||
violin_stat_func_obj_list] | ||
if len(custom_stat_alias_list) > len(set(custom_stat_alias_list)): | ||
_resolve_duplicate_alias(violin_stat_func_obj_list) | ||
# remake alias list based on updated unique aliases | ||
custom_stat_alias_list = [func_obj.alias for func_obj in | ||
violin_stat_func_obj_list] | ||
|
||
vpstats, custom_stat_vals = \ | ||
cbook.violin_stats(dataset, _kde_method, | ||
violin_stat_func_obj_list, points=points) | ||
|
||
return self.violin(vpstats, custom_stat_vals, custom_stat_alias_list, | ||
positions=positions, vert=vert, | ||
widths=widths, showmeans=showmeans, | ||
showextrema=showextrema, showmedians=showmedians) | ||
|
||
def violin(self, vpstats, positions=None, vert=True, widths=0.5, | ||
def violin(self, vpstats, custom_stat_vals, custom_stat_alias_list, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are the custom stat-related values required parameters now? They should be be optional, IMO There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sure, they should be optional |
||
positions=None, vert=True, widths=0.5, | ||
showmeans=False, showextrema=True, showmedians=False): | ||
"""Drawing function for violin plots. | ||
|
||
|
@@ -7511,7 +7570,9 @@ def violin(self, vpstats, positions=None, vert=True, widths=0.5, | |
|
||
# Render violins | ||
bodies = [] | ||
for stats, pos, width in zip(vpstats, positions, widths): | ||
custom_vals = {} | ||
for stats, pos, width, stat_val_dict in zip(vpstats, positions, | ||
widths, custom_stat_vals): | ||
# The 0.5 factor reflects the fact that we plot from v-p to | ||
# v+p | ||
vals = np.array(stats['vals']) | ||
|
@@ -7525,6 +7586,11 @@ def violin(self, vpstats, positions=None, vert=True, widths=0.5, | |
mins.append(stats['min']) | ||
maxes.append(stats['max']) | ||
medians.append(stats['median']) | ||
for alias in custom_stat_alias_list: | ||
if alias not in custom_vals: | ||
custom_vals[alias] = [] | ||
custom_vals[alias].append(stat_val_dict[alias]) | ||
|
||
artists['bodies'] = bodies | ||
|
||
# Render means | ||
|
@@ -7547,6 +7613,12 @@ def violin(self, vpstats, positions=None, vert=True, widths=0.5, | |
pmins, | ||
pmaxes, | ||
colors=edgecolor) | ||
# Render custom statistics | ||
for alias in custom_stat_alias_list: | ||
artists['custom_' + alias] = perp_lines(custom_vals[alias], | ||
pmins, | ||
pmaxes, | ||
colors=edgecolor) | ||
|
||
return artists | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarification: maybe I should not put this class in this file. I meant for it to be public so that user can easily enter input. They have more than just the callable: e.g. percentile requires 1) an array of data and 2) and int. While the array data is drawn from main input , we need to put the integer argument in this object. Furthermore, each function drawn on the violin has to have a corresponding artist object returned by the function _Axes.violin() so we need an "alias" there as well. While each function input alias should be unique, I tried to be defensive by dealing with duplicating alias as discussed below. In short, this is a wrapper object that specifies everything that a callable to be drawn on the violin should know (apart from the array data input, which is drawn elsewhere). Happy to discuss/modify more if required.