Skip to content

Violin plot inconsistency between list and numpy #12178

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
RaphP opened this issue Sep 20, 2018 · 6 comments
Closed

Violin plot inconsistency between list and numpy #12178

RaphP opened this issue Sep 20, 2018 · 6 comments
Labels
Documentation Good first issue Open a pull request against these issues if there are no active ones! status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. status: inactive Marked by the “Stale” Github Action

Comments

@RaphP
Copy link

RaphP commented Sep 20, 2018

Bug report

When feeding the same data to violin plot in list or in numpy array, the result is not the same.

Code for reproduction

import matplotlib.pyplot as plt
import numpy as np

data = [ [1, 2, 3] , [4, 5, 6] ]
data_np = np.array(data)

f, (ax1, ax2) = plt.subplots(2)

ax1.violinplot(data)
ax1.set_title('list')

ax2.set_title('numpy array')
ax2.violinplot(data_np) #Same data but in numpy

plt.show()

Actual outcome

problem

Expected outcome
Both should be the same isn't it ?

Actually the doc specify - in terms I didn't understood at first - that it will
"Make a violin plot for each column of dataset or each vector in sequence dataset. "

It is more clear when looking at the hist doc, where one can read at the end

"Note that the ndarray form is transposed relative to the list form."

The difference between the two outcomes, I think, is the way of thinking at data : vectors in matrix or lists in list (which correspond to columns in array of row in array).
I discussed with some collegues who are working with Matlab, and they do think the vector as base unit of the 2D matrix. For me the base unit is a row of a 2D array.

In the function plot when plotting [ [1,2,3], [4,5,6] ], we have 3 curves, so I thing it is the matrix/vector way of thinking that is predominant in matplotlib.
So should we change the violinplot and histogram behavior for list to work the same as array ?

After writing this I do belive that the best solution is to just mention that specificity more clearly in the doc of violinplot, not changing the code. I can do this, but I would like some feedback from differents points of view before.

Best,
RP

@anntzer
Copy link
Contributor

anntzer commented Sep 20, 2018

See also #8092 for the same issue with boxplots. (I agree that the behavior is... awkward.)

@jklymak
Copy link
Member

jklymak commented Sep 20, 2018

I personally think this difference shouldn’t have happened. I think we should change hist and violinplot to be consistent with plot which if I recall correctly is each data set is in a row and the time increases as you move across columns.

However I suspect this change came in a long time ago so I’m not sure how easy it would be to change without breaking a lot of downstream code. Certainly it should be explained better in the docstring.

@ImportanceOfBeingErnest
Copy link
Member

This feels a bit like a deja-vu, but the main point is that you usually have data in columns. However, if you use datasets of different length they cannot be stored in a matrix, hence the option to use a list of (possibly unequal-length) datasets.
Concerning plot, it only ever plots columns, even if supplied as list of lists. That makes sense, because for plot you cannot have unequal-length datasets.

@jklymak
Copy link
Member

jklymak commented Sep 20, 2018

Oops got it backwards in my post above.

Ok it’s fair enough to say a list will each contain a distinct data set. That’s really different than saying one is transposed from the other. I appreciate that if you make a matrix from the list of lists it will be transposed, but that description misses the point. A change to all the docstring s that use these dual conventions the way you nicely explain it above would be very helpful. The current docstrings are far too inscrutable.

@jklymak jklymak added Documentation Good first issue Open a pull request against these issues if there are no active ones! labels Sep 27, 2018
@jklymak jklymak added this to the v3.1 milestone Sep 27, 2018
@nunocalaim
Copy link

@github-actions
Copy link

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label May 16, 2023
@github-actions github-actions bot added the status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. label Jun 16, 2023
@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Good first issue Open a pull request against these issues if there are no active ones! status: closed as inactive Issues closed by the "Stale" Github Action. Please comment on any you think should still be open. status: inactive Marked by the “Stale” Github Action
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants