-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
correct reference to 68-95-99.7% rule #19074
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
correct reference to 68-95-99.7% rule #19074
Conversation
That explanation could be expanded, or find a reference. In my understanding a confidence ellipse is made by applying an SVD to the data and fitting normal distributions to the new independent axes. I agree with you that if you make the major axes of the confidence ellipse 3 standard deviations in the "x" of the SVD, and the minor axes 3 standard deviations in the "y" of the SVD, then you will indeed have less than 99.7 % of the data. But that is just by definition of what you mean by "3". You could have meant "3" to mean 99.7% of the data and made your ellipses a bit larger to compensate. If one or the other is standard, it would be good to specify and cite. |
Hi @jklymak,
When in the end it turns out to be too much text I am still fine with trimming it down to size, again. |
This whole example would be better if it referred to a standard text, or definitive online reference. |
An authoritative online reference is a good idea. I will pick one and add it. Maybe it will need an explanation that builds the bridge to this case, but that should be concise. I will have some time around the holidays to address the 68-95-99.7 thing. Should I close this PR for now? Or should we at least remove the incorrect statement about the 68-95-99.7 percent? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree we should change the number and include a brief description, but not with a long digression.
|
||
############################################################################# | ||
# | ||
# No 68-95-99.7 Rule | ||
# """""""""""""""""" | ||
# | ||
# .. no_rule: | ||
# The well-known 68-95-99.7 rule states that in a normal distribution 68% | ||
# of the observations lie within +/- 1 standard deviation from the mean, 95% | ||
# within 2 standard deviations and 99.7% within 3 standard deviations. | ||
# | ||
# However this rule only holds for one-dimensional data. | ||
# To adjust this rule to two-dimensional data, | ||
# it suffices to square the probabilities | ||
# of 0.68, 0.95 and 99.7 to 0.462, 0.903 and 0.994, respectively. | ||
# If you wish, you can think of it as a new "46-90-99 Rule" for | ||
# two-dimensional data. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
############################################################################# | |
# | |
# No 68-95-99.7 Rule | |
# """""""""""""""""" | |
# | |
# .. no_rule: | |
# The well-known 68-95-99.7 rule states that in a normal distribution 68% | |
# of the observations lie within +/- 1 standard deviation from the mean, 95% | |
# within 2 standard deviations and 99.7% within 3 standard deviations. | |
# | |
# However this rule only holds for one-dimensional data. | |
# To adjust this rule to two-dimensional data, | |
# it suffices to square the probabilities | |
# of 0.68, 0.95 and 99.7 to 0.462, 0.903 and 0.994, respectively. | |
# If you wish, you can think of it as a new "46-90-99 Rule" for | |
# two-dimensional data. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @jklymak,
I have integrated your suggestion into the file. Looks like the linters are happy, too.
As far as I am concerned, it can stay like this until I have have found links that cover the issue well.
Co-authored-by: Jody Klymak <jklymak@gmail.com>
Thanks for correcting this @CarstenSchelp |
PR Summary
When I first created this example I stated that a plotted ellipse of e.g. 3 standarddeviations would contain 99.7% of the total observations, if the data was normally distributed.
However, this reference to the 68-95-99.7 rule does not hold with more than one dimension. I have corrected this statement and added a very brief explanation.
If you want to read more on this, please look at the comments on my gitHub-Gist on the same subject. The comments there pointed me to the issue, in the first place.
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).