-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Violin plot inconsistency between list and numpy #12178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
See also #8092 for the same issue with boxplots. (I agree that the behavior is... awkward.) |
I personally think this difference shouldn’t have happened. I think we should change hist and violinplot to be consistent with plot which if I recall correctly is each data set is in a row and the time increases as you move across columns. However I suspect this change came in a long time ago so I’m not sure how easy it would be to change without breaking a lot of downstream code. Certainly it should be explained better in the docstring. |
This feels a bit like a deja-vu, but the main point is that you usually have data in columns. However, if you use datasets of different length they cannot be stored in a matrix, hence the option to use a list of (possibly unequal-length) datasets. |
Oops got it backwards in my post above. Ok it’s fair enough to say a list will each contain a distinct data set. That’s really different than saying one is transposed from the other. I appreciate that if you make a matrix from the list of lists it will be transposed, but that description misses the point. A change to all the docstring s that use these dual conventions the way you nicely explain it above would be very helpful. The current docstrings are far too inscrutable. |
This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help! |
Bug report
When feeding the same data to violin plot in list or in numpy array, the result is not the same.
Code for reproduction
Actual outcome
Expected outcome
Both should be the same isn't it ?
Actually the doc specify - in terms I didn't understood at first - that it will
"Make a violin plot for each column of dataset or each vector in sequence dataset. "
It is more clear when looking at the hist doc, where one can read at the end
"Note that the ndarray form is transposed relative to the list form."
The difference between the two outcomes, I think, is the way of thinking at data : vectors in matrix or lists in list (which correspond to columns in array of row in array).
I discussed with some collegues who are working with Matlab, and they do think the vector as base unit of the 2D matrix. For me the base unit is a row of a 2D array.
In the function plot when plotting [ [1,2,3], [4,5,6] ], we have 3 curves, so I thing it is the matrix/vector way of thinking that is predominant in matplotlib.
So should we change the violinplot and histogram behavior for list to work the same as array ?
After writing this I do belive that the best solution is to just mention that specificity more clearly in the doc of violinplot, not changing the code. I can do this, but I would like some feedback from differents points of view before.
Best,
RP
The text was updated successfully, but these errors were encountered: