Skip to content

correct reference to 68-95-99.7% rule #19074

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Dec 11, 2020
Merged

correct reference to 68-95-99.7% rule #19074

merged 6 commits into from
Dec 11, 2020

Conversation

CarstenSchelp
Copy link
Contributor

PR Summary

When I first created this example I stated that a plotted ellipse of e.g. 3 standarddeviations would contain 99.7% of the total observations, if the data was normally distributed.
However, this reference to the 68-95-99.7 rule does not hold with more than one dimension. I have corrected this statement and added a very brief explanation.
If you want to read more on this, please look at the comments on my gitHub-Gist on the same subject. The comments there pointed me to the issue, in the first place.

PR Checklist

  • Has pytest style unit tests (and pytest passes).
  • Is Flake 8 compliant (run flake8 on changed files to check).
  • [N/A] New features are documented, with examples if plot related.
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).
  • Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
  • [N/A] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • [N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

@jklymak
Copy link
Member

jklymak commented Dec 6, 2020

That explanation could be expanded, or find a reference.

In my understanding a confidence ellipse is made by applying an SVD to the data and fitting normal distributions to the new independent axes. I agree with you that if you make the major axes of the confidence ellipse 3 standard deviations in the "x" of the SVD, and the minor axes 3 standard deviations in the "y" of the SVD, then you will indeed have less than 99.7 % of the data. But that is just by definition of what you mean by "3". You could have meant "3" to mean 99.7% of the data and made your ellipses a bit larger to compensate. If one or the other is standard, it would be good to specify and cite.

@CarstenSchelp
Copy link
Contributor Author

Hi @jklymak,
I was anxious to put too much text, in the first place. If it needs more explanation, I am definitely willing to elaborate. Still thinking, though, how much theory the reader of the example might actually want to see. I think it will come up shortly with something along the lines of:

  • the standarddeviation has a very clear definition.
  • the 68-95-99.7 rule is a mere consequence of it.
  • the rule does not hold in 2+ dimensions
  • instead of manipulating the ellipse that naturally emerges, suggest to apply the same rule with an extension that makes the rule still hold.
  • how to correct: single_dimensional_probablity**n_dimensions
  • why this correction works.

When in the end it turns out to be too much text I am still fine with trimming it down to size, again.
But after all, it's all at the end of the page "just for the curious". So why not explain it neatly.

@jklymak
Copy link
Member

jklymak commented Dec 7, 2020

This whole example would be better if it referred to a standard text, or definitive online reference.

@CarstenSchelp
Copy link
Contributor Author

CarstenSchelp commented Dec 8, 2020

This whole example would be better if it referred to a standard text, or definitive online reference.

An authoritative online reference is a good idea. I will pick one and add it. Maybe it will need an explanation that builds the bridge to this case, but that should be concise.
The generation of the ellipse itself uses a new method that, to the best of my knowledge is explained in my own github-pages article, only. So we'll have to live with that.

I will have some time around the holidays to address the 68-95-99.7 thing.

Should I close this PR for now? Or should we at least remove the incorrect statement about the 68-95-99.7 percent?

Copy link
Member

@jklymak jklymak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree we should change the number and include a brief description, but not with a long digression.

Comment on lines 217 to 234

#############################################################################
#
# No 68-95-99.7 Rule
# """"""""""""""""""
#
# .. no_rule:
# The well-known 68-95-99.7 rule states that in a normal distribution 68%
# of the observations lie within +/- 1 standard deviation from the mean, 95%
# within 2 standard deviations and 99.7% within 3 standard deviations.
#
# However this rule only holds for one-dimensional data.
# To adjust this rule to two-dimensional data,
# it suffices to square the probabilities
# of 0.68, 0.95 and 99.7 to 0.462, 0.903 and 0.994, respectively.
# If you wish, you can think of it as a new "46-90-99 Rule" for
# two-dimensional data.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#############################################################################
#
# No 68-95-99.7 Rule
# """"""""""""""""""
#
# .. no_rule:
# The well-known 68-95-99.7 rule states that in a normal distribution 68%
# of the observations lie within +/- 1 standard deviation from the mean, 95%
# within 2 standard deviations and 99.7% within 3 standard deviations.
#
# However this rule only holds for one-dimensional data.
# To adjust this rule to two-dimensional data,
# it suffices to square the probabilities
# of 0.68, 0.95 and 99.7 to 0.462, 0.903 and 0.994, respectively.
# If you wish, you can think of it as a new "46-90-99 Rule" for
# two-dimensional data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jklymak,
I have integrated your suggestion into the file. Looks like the linters are happy, too.
As far as I am concerned, it can stay like this until I have have found links that cover the issue well.

@jklymak jklymak merged commit 5926c2a into matplotlib:master Dec 11, 2020
@jklymak
Copy link
Member

jklymak commented Dec 11, 2020

Thanks for correcting this @CarstenSchelp

@QuLogic QuLogic added this to the v3.4.0 milestone Jun 29, 2021
@liamtoney liamtoney mentioned this pull request Sep 29, 2021
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants