Skip to content

Jitter plot #2750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
yoavram opened this issue Jan 22, 2014 · 6 comments
Closed

Jitter plot #2750

yoavram opened this issue Jan 22, 2014 · 6 comments

Comments

@yoavram
Copy link

yoavram commented Jan 22, 2014

It would be great if we can have either a jitter command or a jitter flag in scatter.
I'm refering to something like ggplot2's geom_jitter.

A basic solution:

def rand_jitter(arr):
    stdev = arr.max()/100.
    return arr + np.random.randn(len(arr)) * stdev

def jitter(x, y, s=20, c='b', marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, hold=None, **kwargs):
    return scatter(rand_jitter(x), rand_jitter(y), s=20, c='b', marker='o', cmap=None, norm=None, vmin=None, vmax=None, alpha=None, linewidths=None, verts=None, hold=None, **kwargs)

Of course the definition of stdev needs to be better, this will only work for some cases.

The idea is from this SoF post.

@tacaswell
Copy link
Member

I am very against adding a jitter function, I see it as crossing the line to unethical data mangling/falsification (but I know opinions differ). If I saw a colleague doing this I would call them out in very strong terms.

If people really want to do this, than they should do their data-modification explicitly, outside of mpl, so we never get users who are surprised that the library changed their data (in a random way!) before plotting it.

Leaving this issue open just to make I am not way out in left field on this.

@WeatherGod
Copy link
Member

I will say that there are some very limited situations where a "jitter"
is a good thing (can help to avoid over-fitting models on low-precision
data and such), I absolutely do agree that such things need to be done
explicitly prior to plotting. One of matplotlib's philosophies is to keep
it simple by just simply plotting the data as given. Failure to separate
the responsibility of preping the data and plotting the data results in
abominations that is the histogram functions.

@tacaswell
Copy link
Member

However, in those cases I assume that a) you have a decent model of what the noise should be, b) you run the jitter + fit routine a bunch of times to make sure you get consistent results, and c) are very forth coming about exactly what you did and why you did it.

Plotting the jittered data gives a false sense of precision.

@WeatherGod
Copy link
Member

Absolutely agree with those points. However, if you are going to use
jittered data, then I suggest that it helps in explaining that the data was
jittered by plotting the un-jittered data on top of the jittered data, that
way you are completely transparent about it.

On Wed, Jan 22, 2014 at 9:57 AM, Thomas A Caswell
notifications@github.comwrote:

However, in those cases I assume that a) you have a decent model of what
the noise should be, b) you run the jitter + fit routine a bunch of
times to make sure you get consistent results, and c) are very forth coming
about exactly what you did and why you did it.

Plotting the jittered data gives a false sense of precision.


Reply to this email directly or view it on GitHubhttps://github.com//issues/2750#issuecomment-33030079
.

@efiring
Copy link
Member

efiring commented Jan 22, 2014

I agree with the arguments against including a jitter in mpl. The primary legitimate use would appear to be to provide a way of visualizing data sets with duplicate values. Maybe a better way would be to process the data such that the number of duplicates of a given x,y pair would be used to set a color or a symbol size in scatter. In any case, the processing, whether a duplicate count or a jitter addition, can be left as the responsibility of the user, not mpl.

@efiring efiring closed this as completed Jan 22, 2014
@yoavram
Copy link
Author

yoavram commented Jan 27, 2014

Thanks for the informative discussion! I didn't think about it that way and I agree with your comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants