-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Updated violin plot example as per suggestions in issue #7251 #7360
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
lav = vals[0] | ||
if lav > q1: | ||
lav = q1 | ||
lav = np.clip(lav, q1, vals[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you might have these two backwards.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please comment a bit more on what you mean by backwards?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q1
is the maximum and vals[0]
is the minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am sorry, I do have them backwards. Changing that now
# medians | ||
med = [np.percentile(sarr, 50) for sarr in dat] | ||
# inter-quartile ranges | ||
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can use the axis
parameter and multiple q
to compute all three of these at the same time without the list comprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To compute all of those in one call (without list comprehensions), I could do something like:
for sarr in dat:
median, q1, q3 = np.percentile(sarr, [50, 25, 75])
med.append(median)
iqr.append([q1, q3])
avs.append(adjacent_values(sarr))
Is that fine?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can pass the entire array and use the axis
parameter to get it to work on the right dimension. avs
would probably need to stay as the list comprehension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it.
I could replace
med = [np.percentile(sarr, 50) for sarr in dat]
with
med = np.percentile(dat, 50, axis=1)
Since iqr is a list of tuples (or rather, list of lists of size 2), I am unable to see how we could do all 3 percentiles with a single call. I could still replace
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
with
iqr = zip(*(np.percentile(dat, [25, 75], axis=1)))
if that is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for the zip
; just use .T
to swap the axes. You could do it in one call with a temporary variable, but I'm not sure it's any better looking:
tmp = np.percentile(dat, [25, 50, 75], axis=1))).T
med = tmp[:, 1]
iqr = tmp[:, [0, 2]]
|
||
q1, q3 = np.percentile(vals, [25, 75]) | ||
# inter-quartile range iqr | ||
iqr = q3 - q1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about removing this comment and instead changing to clearer variable names:
interquartile_range = q3 - q1
|
||
q1, q3 = np.percentile(vals, [25, 75]) | ||
# inter-quartile range iqr | ||
iqr = q3 - q1 | ||
# upper adjacent values | ||
uav = q3 + iqr * 1.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same, change to
upper_adajcent_values = q3 - interquartile_range*1.5
ax2.plot([i + 1, i + 1], avs[i], '-', color='black', linewidth=1) | ||
# quartiles | ||
ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5) | ||
# medians |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doesn't need this comment 'cause clear variable name. Also, still in favor of vlines for this (like in #6814) because I think that's a clearer indication of what's being plotted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@story645 👍 for better variable names, I have made changes and pushed a commit. I also intend to include the change to vlines in a separate commit, if we go ahead with that.
# inter-quartile ranges | ||
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat] | ||
# upper and lower adjacent values | ||
avs = [adjacent_values(sarr) for sarr in dat] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since this is the whiskers, why not change the variable name to whiskers? and again, then removing the comment
|
||
# customize colors | ||
for pc in parts['bodies']: | ||
pc.set_facecolor('#D43F3A') | ||
pc.set_edgecolor('black') | ||
pc.set_alpha(1) | ||
|
||
ax1.set_ylabel('Observed values') | ||
medians = np.percentile(data, 50, axis=1) | ||
inter_quartile_ranges = list(zip(*(np.percentile(data, [25, 75], axis=1)))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't need the list(zip
; use .T
as noted before.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed 👍 , have made the change as suggested in latest commit
|
||
# customize colors | ||
for pc in parts['bodies']: | ||
pc.set_facecolor('#D43F3A') | ||
pc.set_edgecolor('black') | ||
pc.set_alpha(1) | ||
|
||
ax1.set_ylabel('Observed values') | ||
tmp = (np.percentile(data, [25, 50, 75], axis=1)).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you actually need a transpose here or would untransposed be the same and then: median = quantiles[1], inter quartile range = qauntiles[[0,2]]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without using the transpose would work for the medians. For the inter quartile ranges we do need a transpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes no sense. Either the rows are turned into columns or they aren't. I could be doing the selection wrong on the rows for inter quartile range though. What's the shape of quantiles before the transpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The list inter_quartile_ranges is supposed to be a list of pairs.
Before the transpose, the shape of quantiles would be 3 x (some N). Let's say these three rows are numbered 0, 1, 2. Row 1 would be the medians, no problem there. If we do
inter_quartile_ranges = tmp[[0, 2]]
(without the initial transpose), it's shape would be 2 x (some N). We need it's shape to be (some N) x (2) for it to be a list of pairs.
Please correct me if I'm wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, then maybe save the transpose to inter_quartile_ranges = tmp[[0,2]].T
? And I think tmp should be named qaurtiles
since that's what they are. That being said, if you go the vlines approach, it simplifies to quartile1, median, quartile3 = np.percentile(data, [25, 50, 75], axis=1) where qaurtile1 is iqrmin, etc.
Also, currently iqr is getting computed both here and in adjacent values and that should probably be cleaned up. My first pass suggestion is change the function sig of adjacent_values to adjacent_vals(vals, q1, q3)
, drop the first two lines in adjacent_vals, and the call is
whiskers = [adjacent_values(sorted_array, q1, qr ) for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, makes sense. Firstly, I have changed the variable name to 'quartiles' and altered the transpose calculation, as suggested. I'm working on the vlines approach and the adjacent_values() function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@story645 I could do something like this, as per your code in the original issue and your suggestions above:
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
inter_quartile_ranges = np.vstack([quartile1, quartile3]).T
whiskers = [
adjacent_values(sorted_array, q1, q3)
for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
whiskersMin, whiskersMax = list(zip(*whiskers))
# plot medians as points,
# whiskers as thin lines, quartiles as fat lines
inds = np.arange(1, len(medians) + 1)
ax2.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)
ax2.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)
ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)
I think this looks good. Any comments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I committed the changes for now. If you have any suggestions, please put them in a review, thanks!
Considering you are working on an example, can you please add a docstring with a title + a description while you are at it? """
===================
Saving an animation
===================
This example showcases the same animations as `basic_example.py`, but instead
of displaying the animation to the user, it writes to files using a
MovieWriter instance.
"""``` |
Hi @NelleV , there's already a docstring in the example, written by the author. Is that fine? |
@DrNightmare yes, it is perfect… |
|
||
# customize colors | ||
for pc in parts['bodies']: | ||
pc.set_facecolor('#D43F3A') | ||
pc.set_edgecolor('black') | ||
pc.set_alpha(1) | ||
|
||
ax1.set_ylabel('Observed values') | ||
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1) | ||
inter_quartile_ranges = np.vstack([quartile1, quartile3]).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pretty sure this variable is no longer used and so can be removed
adjacent_values(sorted_array, q1, q3) | ||
for sorted_array, q1, q3 in zip(data, quartile1, quartile3)] | ||
whiskersMin, whiskersMax = list(zip(*whiskers)) | ||
# plot medians as points, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can remove these comments, probably
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm right, will remove these comments and the unused variable
|
||
lower_adjacent_value = q1 - (q3 - q1) * 1.5 | ||
lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1) | ||
return [lower_adjacent_value, upper_adjacent_value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probabky doesn't need to be bracketed [], wonder if it can be used to remove zip* stuff in the unpacking...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To avoid the unpacking, I could also optionally make 'whiskers' an np.array and then do:
whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1]
Irrespective of whether I do this or not, the brackets can be removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then I think it makes sense to do that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, updated in latest commit
I'm happy and okay with merging this |
ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5) | ||
ax2.plot(i + 1, med[i], 'o', color='white', | ||
markersize=6, markeredgecolor='none') | ||
# customized violin |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stray redundant comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, right. Guess the title we set for the axes are self explanatory.
Updated with comments removed @story645
ax2.set_title('Customized violin plot') | ||
parts = ax2.violinplot( | ||
data, showmeans=False, showmedians=False, | ||
showextrema=False) | ||
|
||
# customize colors |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably also unnecessary comment
ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1) | ||
|
||
# set style for the axes | ||
labels = ['A', 'B', 'C', 'D'] # labels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
variable name = comment
@DrNightmare Thanks! These improvements are much appreciated. |
Updated violin plot example as per suggestions in issue #7251
Backported to |
@DrNightmare Thanks, hopefully we will hear from you again! |
@tacaswell Sure :) |
🍾 |
@DrNightmare the cool thing about about contributing to the docs is that the benefits are (nearly) immediately available: |
As per issue #7251 , there were certain changes required in the customized violin plot example.
Here is a list of changes made:
Minor changes:
4. Moved the styling of axes to a function
5. Rearranged overall code and added comments to improve readability, made some code more pythonic, renamed certain variables