correct reference to 68-95-99.7% rule #19074

CarstenSchelp · 2020-12-06T13:34:00Z

PR Summary

When I first created this example I stated that a plotted ellipse of e.g. 3 standarddeviations would contain 99.7% of the total observations, if the data was normally distributed.
However, this reference to the 68-95-99.7 rule does not hold with more than one dimension. I have corrected this statement and added a very brief explanation.
If you want to read more on this, please look at the comments on my gitHub-Gist on the same subject. The comments there pointed me to the issue, in the first place.

PR Checklist

Has pytest style unit tests (and pytest passes).
Is Flake 8 compliant (run flake8 on changed files to check).
[N/A] New features are documented, with examples if plot related.
Documentation is sphinx and numpydoc compliant (the docs should build without error).
Conforms to Matplotlib style conventions (install flake8-docstrings and run flake8 --docstring-convention=all).
[N/A] New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
[N/A] API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).

jklymak · 2020-12-06T17:36:22Z

That explanation could be expanded, or find a reference.

In my understanding a confidence ellipse is made by applying an SVD to the data and fitting normal distributions to the new independent axes. I agree with you that if you make the major axes of the confidence ellipse 3 standard deviations in the "x" of the SVD, and the minor axes 3 standard deviations in the "y" of the SVD, then you will indeed have less than 99.7 % of the data. But that is just by definition of what you mean by "3". You could have meant "3" to mean 99.7% of the data and made your ellipses a bit larger to compensate. If one or the other is standard, it would be good to specify and cite.

CarstenSchelp · 2020-12-06T19:29:03Z

Hi @jklymak,
I was anxious to put too much text, in the first place. If it needs more explanation, I am definitely willing to elaborate. Still thinking, though, how much theory the reader of the example might actually want to see. I think it will come up shortly with something along the lines of:

the standarddeviation has a very clear definition.
the 68-95-99.7 rule is a mere consequence of it.
the rule does not hold in 2+ dimensions
instead of manipulating the ellipse that naturally emerges, suggest to apply the same rule with an extension that makes the rule still hold.
how to correct: single_dimensional_probablity**n_dimensions
why this correction works.

When in the end it turns out to be too much text I am still fine with trimming it down to size, again.
But after all, it's all at the end of the page "just for the curious". So why not explain it neatly.

jklymak · 2020-12-07T00:53:38Z

This whole example would be better if it referred to a standard text, or definitive online reference.

CarstenSchelp · 2020-12-08T08:34:14Z

This whole example would be better if it referred to a standard text, or definitive online reference.

An authoritative online reference is a good idea. I will pick one and add it. Maybe it will need an explanation that builds the bridge to this case, but that should be concise.
The generation of the ellipse itself uses a new method that, to the best of my knowledge is explained in my own github-pages article, only. So we'll have to live with that.

I will have some time around the holidays to address the 68-95-99.7 thing.

Should I close this PR for now? Or should we at least remove the incorrect statement about the 68-95-99.7 percent?

jklymak

Agree we should change the number and include a brief description, but not with a long digression.

examples/statistics/confidence_ellipse.py

jklymak · 2020-12-08T16:33:18Z

examples/statistics/confidence_ellipse.py

+
+#############################################################################
+#
+# No 68-95-99.7 Rule
+# """"""""""""""""""
+#
+# .. no_rule:
+# The well-known 68-95-99.7 rule states that in a normal distribution 68%
+# of the observations lie within +/- 1 standard deviation from the mean, 95%
+# within 2 standard deviations and 99.7% within 3 standard deviations.
+#
+# However this rule only holds for one-dimensional data.
+# To adjust this rule to two-dimensional data,
+# it suffices to square the probabilities
+# of 0.68, 0.95 and 99.7 to 0.462, 0.903 and 0.994, respectively.
+# If you wish, you can think of it as a new "46-90-99 Rule" for
+# two-dimensional data.
+


Suggested change

#############################################################################

#

# No 68-95-99.7 Rule

# """"""""""""""""""

#

# .. no_rule:

# The well-known 68-95-99.7 rule states that in a normal distribution 68%

# of the observations lie within +/- 1 standard deviation from the mean, 95%

# within 2 standard deviations and 99.7% within 3 standard deviations.

#

# However this rule only holds for one-dimensional data.

# To adjust this rule to two-dimensional data,

# it suffices to square the probabilities

# of 0.68, 0.95 and 99.7 to 0.462, 0.903 and 0.994, respectively.

# If you wish, you can think of it as a new "46-90-99 Rule" for

# two-dimensional data.

Hi @jklymak,
I have integrated your suggestion into the file. Looks like the linters are happy, too.
As far as I am concerned, it can stay like this until I have have found links that cover the issue well.

Co-authored-by: Jody Klymak <jklymak@gmail.com>

jklymak · 2020-12-11T17:01:23Z

Thanks for correcting this @CarstenSchelp

carstesc added 2 commits December 6, 2020 13:10

correct reference to 68-95-99.7% rule

1afc571

flake8 corrections

7bbaf91

jklymak added the Documentation label Dec 6, 2020

add blank line after trailing documentation block.

cf05e74

jklymak reviewed Dec 8, 2020

View reviewed changes

CarstenSchelp and others added 3 commits December 11, 2020 14:13

commit @jklymak's suggestion

c5aca82

Co-authored-by: Jody Klymak <jklymak@gmail.com>

integrate suggested change.

5fb61a4

fix line-too-long

c9dfbed

jklymak approved these changes Dec 11, 2020

View reviewed changes

jklymak merged commit 5926c2a into matplotlib:master Dec 11, 2020

QuLogic added this to the v3.4.0 milestone Jun 29, 2021

liamtoney mentioned this pull request Sep 29, 2021

Use correct confidence interval #21216

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

correct reference to 68-95-99.7% rule #19074

correct reference to 68-95-99.7% rule #19074

CarstenSchelp commented Dec 6, 2020

jklymak commented Dec 6, 2020 •

edited

Loading

CarstenSchelp commented Dec 6, 2020

jklymak commented Dec 7, 2020

CarstenSchelp commented Dec 8, 2020 •

edited by jklymak

Loading

jklymak left a comment

jklymak Dec 8, 2020

CarstenSchelp Dec 11, 2020

jklymak commented Dec 11, 2020

correct reference to 68-95-99.7% rule #19074

correct reference to 68-95-99.7% rule #19074

Conversation

CarstenSchelp commented Dec 6, 2020

PR Summary

PR Checklist

jklymak commented Dec 6, 2020 • edited Loading

CarstenSchelp commented Dec 6, 2020

jklymak commented Dec 7, 2020

CarstenSchelp commented Dec 8, 2020 • edited by jklymak Loading

jklymak left a comment

Choose a reason for hiding this comment

jklymak Dec 8, 2020

Choose a reason for hiding this comment

CarstenSchelp Dec 11, 2020

Choose a reason for hiding this comment

jklymak commented Dec 11, 2020

jklymak commented Dec 6, 2020 •

edited

Loading

CarstenSchelp commented Dec 8, 2020 •

edited by jklymak

Loading