Skip to content

Updated violin plot example as per suggestions in issue #7251 #7360

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 31, 2016

Conversation

DrNightmare
Copy link
Contributor

As per issue #7251 , there were certain changes required in the customized violin plot example.

Here is a list of changes made:

  1. Replaced the defined percentile function with a call to np.percentile
  2. Used np.clip as suggested
  3. Used list comprehensions, this is more pythonic and more readable

Minor changes:
4. Moved the styling of axes to a function
5. Rearranged overall code and added comments to improve readability, made some code more pythonic, renamed certain variables

lav = vals[0]
if lav > q1:
lav = q1
lav = np.clip(lav, q1, vals[0])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you might have these two backwards.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please comment a bit more on what you mean by backwards?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q1 is the maximum and vals[0] is the minimum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am sorry, I do have them backwards. Changing that now

# medians
med = [np.percentile(sarr, 50) for sarr in dat]
# inter-quartile ranges
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use the axis parameter and multiple q to compute all three of these at the same time without the list comprehension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To compute all of those in one call (without list comprehensions), I could do something like:

for sarr in dat:
    median, q1, q3 = np.percentile(sarr, [50, 25, 75])
    med.append(median)
    iqr.append([q1, q3])
    avs.append(adjacent_values(sarr))

Is that fine?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can pass the entire array and use the axis parameter to get it to work on the right dimension. avs would probably need to stay as the list comprehension.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it.
I could replace
med = [np.percentile(sarr, 50) for sarr in dat]
with
med = np.percentile(dat, 50, axis=1)

Since iqr is a list of tuples (or rather, list of lists of size 2), I am unable to see how we could do all 3 percentiles with a single call. I could still replace
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
with
iqr = zip(*(np.percentile(dat, [25, 75], axis=1)))
if that is fine.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the zip; just use .T to swap the axes. You could do it in one call with a temporary variable, but I'm not sure it's any better looking:

tmp = np.percentile(dat, [25, 50, 75], axis=1))).T
med = tmp[:, 1]
iqr = tmp[:, [0, 2]]


q1, q3 = np.percentile(vals, [25, 75])
# inter-quartile range iqr
iqr = q3 - q1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about removing this comment and instead changing to clearer variable names:
interquartile_range = q3 - q1


q1, q3 = np.percentile(vals, [25, 75])
# inter-quartile range iqr
iqr = q3 - q1
# upper adjacent values
uav = q3 + iqr * 1.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same, change to
upper_adajcent_values = q3 - interquartile_range*1.5

ax2.plot([i + 1, i + 1], avs[i], '-', color='black', linewidth=1)
# quartiles
ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5)
# medians
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't need this comment 'cause clear variable name. Also, still in favor of vlines for this (like in #6814) because I think that's a clearer indication of what's being plotted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@story645 👍 for better variable names, I have made changes and pushed a commit. I also intend to include the change to vlines in a separate commit, if we go ahead with that.

# inter-quartile ranges
iqr = [[np.percentile(sarr, 25), np.percentile(sarr, 75)] for sarr in dat]
# upper and lower adjacent values
avs = [adjacent_values(sarr) for sarr in dat]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this is the whiskers, why not change the variable name to whiskers? and again, then removing the comment


# customize colors
for pc in parts['bodies']:
pc.set_facecolor('#D43F3A')
pc.set_edgecolor('black')
pc.set_alpha(1)

ax1.set_ylabel('Observed values')
medians = np.percentile(data, 50, axis=1)
inter_quartile_ranges = list(zip(*(np.percentile(data, [25, 75], axis=1))))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need the list(zip; use .T as noted before.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed 👍 , have made the change as suggested in latest commit


# customize colors
for pc in parts['bodies']:
pc.set_facecolor('#D43F3A')
pc.set_edgecolor('black')
pc.set_alpha(1)

ax1.set_ylabel('Observed values')
tmp = (np.percentile(data, [25, 50, 75], axis=1)).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you actually need a transpose here or would untransposed be the same and then: median = quantiles[1], inter quartile range = qauntiles[[0,2]]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without using the transpose would work for the medians. For the inter quartile ranges we do need a transpose.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes no sense. Either the rows are turned into columns or they aren't. I could be doing the selection wrong on the rows for inter quartile range though. What's the shape of quantiles before the transpose?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list inter_quartile_ranges is supposed to be a list of pairs.
Before the transpose, the shape of quantiles would be 3 x (some N). Let's say these three rows are numbered 0, 1, 2. Row 1 would be the medians, no problem there. If we do
inter_quartile_ranges = tmp[[0, 2]]
(without the initial transpose), it's shape would be 2 x (some N). We need it's shape to be (some N) x (2) for it to be a list of pairs.
Please correct me if I'm wrong.

Copy link
Member

@story645 story645 Oct 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, then maybe save the transpose to inter_quartile_ranges = tmp[[0,2]].T? And I think tmp should be named qaurtiles since that's what they are. That being said, if you go the vlines approach, it simplifies to quartile1, median, quartile3 = np.percentile(data, [25, 50, 75], axis=1) where qaurtile1 is iqrmin, etc.
Also, currently iqr is getting computed both here and in adjacent values and that should probably be cleaned up. My first pass suggestion is change the function sig of adjacent_values to adjacent_vals(vals, q1, q3), drop the first two lines in adjacent_vals, and the call is
whiskers = [adjacent_values(sorted_array, q1, qr ) for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, makes sense. Firstly, I have changed the variable name to 'quartiles' and altered the transpose calculation, as suggested. I'm working on the vlines approach and the adjacent_values() function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@story645 I could do something like this, as per your code in the original issue and your suggestions above:

quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
inter_quartile_ranges = np.vstack([quartile1, quartile3]).T
whiskers = [
    adjacent_values(sorted_array, q1, q3)
    for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
whiskersMin, whiskersMax = list(zip(*whiskers))
# plot medians as points,
# whiskers as thin lines, quartiles as fat lines
inds = np.arange(1, len(medians) + 1)
ax2.scatter(inds, medians, marker='o', color='white', s=30, zorder=3)
ax2.vlines(inds, quartile1, quartile3, color='k', linestyle='-', lw=5)
ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)

I think this looks good. Any comments?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I committed the changes for now. If you have any suggestions, please put them in a review, thanks!

@NelleV
Copy link
Member

NelleV commented Oct 30, 2016

Considering you are working on an example, can you please add a docstring with a title + a description while you are at it?
Here is an example of a docstring:

"""
 ===================
 Saving an animation
 ===================

 This example showcases the same animations as `basic_example.py`, but instead
 of displaying the animation to the user, it writes to files using a
 MovieWriter instance.
 """```

@DrNightmare
Copy link
Contributor Author

Hi @NelleV , there's already a docstring in the example, written by the author. Is that fine?

@NelleV
Copy link
Member

NelleV commented Oct 30, 2016

@DrNightmare yes, it is perfect…
Sorry, I should have looked more closely!


# customize colors
for pc in parts['bodies']:
pc.set_facecolor('#D43F3A')
pc.set_edgecolor('black')
pc.set_alpha(1)

ax1.set_ylabel('Observed values')
quartile1, medians, quartile3 = np.percentile(data, [25, 50, 75], axis=1)
inter_quartile_ranges = np.vstack([quartile1, quartile3]).T
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty sure this variable is no longer used and so can be removed

adjacent_values(sorted_array, q1, q3)
for sorted_array, q1, q3 in zip(data, quartile1, quartile3)]
whiskersMin, whiskersMax = list(zip(*whiskers))
# plot medians as points,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove these comments, probably

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm right, will remove these comments and the unused variable


lower_adjacent_value = q1 - (q3 - q1) * 1.5
lower_adjacent_value = np.clip(lower_adjacent_value, vals[0], q1)
return [lower_adjacent_value, upper_adjacent_value]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probabky doesn't need to be bracketed [], wonder if it can be used to remove zip* stuff in the unpacking...

Copy link
Contributor Author

@DrNightmare DrNightmare Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid the unpacking, I could also optionally make 'whiskers' an np.array and then do:
whiskersMin, whiskersMax = whiskers[:, 0], whiskers[:, 1]

Irrespective of whether I do this or not, the brackets can be removed

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think it makes sense to do that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, updated in latest commit

@story645
Copy link
Member

I'm happy and okay with merging this

@story645 story645 changed the title Updated violin plot example as per suggestions in issue #7251 Updated violin plot example as per suggestions in issue #7251 [MRG+1] Oct 31, 2016
ax2.plot([i + 1, i + 1], iqr[i], '-', color='black', linewidth=5)
ax2.plot(i + 1, med[i], 'o', color='white',
markersize=6, markeredgecolor='none')
# customized violin
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray redundant comment

Copy link
Contributor Author

@DrNightmare DrNightmare Oct 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, right. Guess the title we set for the axes are self explanatory.
Updated with comments removed @story645

ax2.set_title('Customized violin plot')
parts = ax2.violinplot(
data, showmeans=False, showmedians=False,
showextrema=False)

# customize colors
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably also unnecessary comment

ax2.vlines(inds, whiskersMin, whiskersMax, color='k', linestyle='-', lw=1)

# set style for the axes
labels = ['A', 'B', 'C', 'D'] # labels
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

variable name = comment

@NelleV NelleV changed the title Updated violin plot example as per suggestions in issue #7251 [MRG+1] [MRG+2] Updated violin plot example as per suggestions in issue #7251 Oct 31, 2016
@phobson phobson merged commit f5c1f00 into matplotlib:master Oct 31, 2016
@phobson phobson changed the title [MRG+2] Updated violin plot example as per suggestions in issue #7251 Updated violin plot example as per suggestions in issue #7251 Oct 31, 2016
@DrNightmare DrNightmare deleted the violinPlotExample branch October 31, 2016 19:18
@phobson phobson added this to the 2.0.1 (next bug fix release) milestone Oct 31, 2016
@phobson phobson self-assigned this Oct 31, 2016
@phobson phobson added Documentation MEP: MEP12 gallery and examples improvements labels Oct 31, 2016
@phobson
Copy link
Member

phobson commented Oct 31, 2016

@DrNightmare Thanks! These improvements are much appreciated.

@DrNightmare
Copy link
Contributor Author

@phobson You're welcome! I got to learn a lot, this is my first PR in Matplotlib :)
BTW I think this closes #7251

phobson added a commit that referenced this pull request Oct 31, 2016
Updated violin plot example as per suggestions in issue #7251
@phobson
Copy link
Member

phobson commented Oct 31, 2016

Backported to v2.x as 1c6b59a

@phobson phobson mentioned this pull request Oct 31, 2016
3 tasks
@tacaswell
Copy link
Member

@DrNightmare Thanks, hopefully we will hear from you again!

@DrNightmare
Copy link
Contributor Author

@tacaswell Sure :)

@NelleV
Copy link
Member

NelleV commented Oct 31, 2016

🍾

@phobson
Copy link
Member

phobson commented Oct 31, 2016

@DrNightmare the cool thing about about contributing to the docs is that the benefits are (nearly) immediately available:
http://matplotlib.org/devdocs/examples/statistics/customized_violin_demo.html

@QuLogic QuLogic modified the milestones: 2.0.1 (next bug fix release), 2.0 (style change major release) Dec 7, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation MEP: MEP12 gallery and examples improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants