-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[ENH]: Adding jitter mode in scatter
#27935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
This has come up and was rejected at least once before: #2750 If we did take some version of this functionality, it should be its own function/method rather than being tacked onto scatter. This really only makes sense in the narrow case of one of x being categorical so we should let it only work for catagorical data (or at least require integers or maybe it takes a scalar x and a list of y or a sequence x and a sequence of sequence of y (e.g. the groupby) or match the API of boxplot and violinplot). I also suspect that there is going to be a large number of knobs people will want to have added to control how the jitter works which should not be shoe-horned into (already too complicated) signature of I suspect that a majority of these cases are with the realm of what seaborn targets and in that case, we should point people at This sort of thing could be made easier by the work in https://github.com/matplotlib/data-prototype where this could be implemented as step in the processing pipeline. That said, I wonder if this can be achieved by writing a very custom Transform (https://matplotlib.org/stable/api/transformations.html#matplotlib.transforms.Transform) that is offered the full set of x, y and returns the "jittered" x, y. If you write that, then I think you can do In summary my views on the ways this can be addressed:
|
Thanks for the response @tacaswell, that's useful and clearly articulated. I like the |
First, hard agree this cannot be tacked onto
I think that is going to be very hard as you are going to want the jitter in screen space, information of which we don't naturally pipe down to the the Transform. Not impossible, but it's not going to be straightforward. I think you'd be better to write a wrapper and not try and overload |
We have transforms in the annotation context that do absolute offsets so it can be done. I agree not easy, but I think it will be the fun kind of hard. |
Throwing my +1 for some sort of native jittering, it's something I go to reach for fairly often and rediscover that it's not implemented. Was reading https://nightingaledvs.com/ive-stopped-using-box-plots-should-you/ (and somewhat related https://nightingaledvs.com/color-jitter/) this morning and am coming around to seeing it as a fairly fundamental data viz. |
what about adding a ETA: Thinking is example reminds folks that seaborn does this but also shows custom option. And we can maybe use intersphinx w/ seaborn to maintain the links. |
I'm fine with an example or even a method - I just don't think the API should be an option to As for boxplots, it is hard to come up with a reason why you would ever want to use those, but I'm not convinced "jitter" plots are the right way to go either. Multiple histograms, or 2-D histograms seem much better than a weird extra dimension where you are to artistically pick the density out by eye. |
Or violin plots, but swarm and stripplots can be nice for showing how the underlying data yields the distribution and showing discontinuities that may not be apparent in a continuous mapping. ETA: also when the data distribution can't be fitted to a standard distribution. Either way, @scottshambaugh's right that they're a pretty standard technique - ggplot and seaborn implement it and Nightingale is a serious (if not quite academic) viz publication. |
I'm softening a bit from my comment in March and am now neutral on adding a method with an API that rhymes with boxplot/violinplot that makes the "stripplot" from the Nightingale article or "bee plots" (sp? where the jitter range depends on the local density). Anything we do need to make sure that the jitter is only applied to "categorical like" data. Another (maybe bad) idea would be to inject the jitter into the categorical unit handlers. |
swarmplots
Was kinda thinking if |
Also if something gets implemented in mpl, @mwaskom what kinda things could make the seaborn implementations cleaner? |
Scatter plots already handle categorical data seamlessly, so we already have the "stripplot" functionality. I agree with @tacaswell that it makes sense to restrict this to categorical data. Perhaps adding on to |
Seaborn's
And more info on their abstractions of categorical plots: |
Problem
Is there a plan to add a 'jitter' functionality to
scatter
, so that markers are re-positioned to avoid overlap? An example of this idea is described here.Proposed solution
Provided that the number of overlapping markers isn't too large, the re-positioning can be achieved by minimizing a measure of distance over the shape centers (for circles). A basic implementation could be based on this discussion.
The text was updated successfully, but these errors were encountered: