Lec 5 - Normality Testing
Lec 5 - Normality Testing
Lec 5 - Normality Testing
Theoretical normal
distribution calculated
from a mean of 66.51 and a
standard deviation of
18.265.
Graphical methods are typically not very useful when the sample size is
small. This is a histogram of the last example. These data do not look
normal, but they are not statistically different than normal.
Tests of Normality
a
Age
Kolmogorov-Smirnov
df
Sig.
Statistic
1048
.000
.110
Shapiro-Wilk
df
Statistic
.931
1048
Sig.
.000
Tests of Normality
a
TOTAL_VALU
Kolmogorov-Smirnov
Statistic
df
Sig.
.283
149
.000
Statistic
.463
Shapiro-Wilk
df
149
Sig.
.000
Tests of Normality
a
Z100
Kolmogorov-Smirnov
Statistic
df
Sig.
.071
100
.200*
Statistic
.985
Shapiro-Wilk
df
100
Sig.
.333
Asthma Cases
Kolmogorov-Smirnov
Statistic
df
Sig.
.069
.200*
72
Shapiro-Wilk
Statistic
df
72
.988
Sig.
.721
Average PM10
Kolmogorov-Smirnov
Statistic
df
Sig.
.142
72
.001
Shapiro-Wilk
Statistic
df
.841
72
Sig.
.000
Asthma Cases
Kolmogorov-Smirnov
Statistic
df
Sig.
.069
.200*
72
Shapiro-Wilk
Statistic
df
72
.988
Sig.
.721
In SPSS output above the probabilities are greater than 0.05 (the
typical alpha level), so we accept Ho these data are not different
from normal.
Average PM10
Kolmogorov-Smirnov
Statistic
df
Sig.
.142
72
.001
Shapiro-Wilk
Statistic
df
.841
72
Sig.
.000
In the SPSS output above the probabilities are less than 0.05 (the typical
alpha level), so we reject Ho these data are significantly different from
normal.
Important: As the sample size increases, normality parameters becomes
MORE restrictive and it becomes harder to declare that the data are
normally distributed. So for very large data sets, normality testing
becomes less important.
w
q=
s
where q is the test statistic, w is the range of the data and s is
the standard deviation.
Range constant,
SD changes
Range changes,
SD constant
Village
Aranza
Corupo
San Lorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro
Population
Density
4.13
4.53
4.69
4.76
4.77
4.96
4.97
5.00
5.04
5.10
5.25
5.36
5.94
6.06
6.19
6.30
7.73
q=
w
s
q=
3.6
= 4.16
0.866
The W/S test uses a critical range. IF the calculated value falls WITHIN the range,
then accept Ho. IF the calculated value falls outside the range then reject Ho.
Since 3.06 < q=4.16 < 4.31, then we accept Ho.
JarqueBera Test
A goodness-of-fit test of whether sample data have the skewness
and kurtosis matching a normal distribution.
n
k3 =
( xi x )
i =1
ns
k4 =
4
x
x
(
)
i
i =1
ns
(k 3 )2 (k 4 )2
+
JB = n
24
6
x = 5.34
s = 0.87
Village
Aranza
Corupo
San Lorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro
Population
Density
4.13
4.53
4.69
4.76
4.77
4.96
4.97
5.00
5.04
5.10
5.25
5.36
5.94
6.06
6.19
6.30
7.73
Mean
Deviates
-1.21
-0.81
-0.65
-0.58
-0.57
-0.38
-0.37
-0.34
-0.30
-0.24
-0.09
0.02
0.60
0.72
0.85
0.96
2.39
Mean
Deviates3
-1.771561
-0.531441
-0.274625
-0.195112
-0.185193
-0.054872
-0.050653
-0.039304
-0.027000
-0.013824
-0.000729
0.000008
0.216000
0.373248
0.614125
0.884736
13.651919
12.595722
Mean
Deviates4
2.14358881
0.43046721
0.17850625
0.11316496
0.10556001
0.02085136
0.01874161
0.01336336
0.00810000
0.00331776
0.00006561
0.00000016
0.12960000
0.26873856
0.52200625
0.84934656
32.62808641
37.433505
12.6
k3 =
= 1.13
3
(17)(0.87 )
37.43
k4 =
3 = 0.843
4
(17)(0.87 )
(1.13)2 (0.843)2
1.2769 0.711
= 17
+
JB = 17
+
24
24
6
6
JB = 17(0.2128 + 0.0296 )
JB = 4.12
DAgostino Test
A very powerful test for departures from normality.
Based on the D statistic, which gives an upper and lower critical
value.
D=
T
3
n SS
where
n +1
T = i
Xi
2
Village
Aranza
Corupo
San Lorenzo
Cheranatzicurin
Nahuatzen
Pomacuaran
Sevina
Arantepacua
Cocucho
Charapan
Comachuen
Pichataro
Quinceo
Nurio
Turicuaro
Urapicho
Capacuaro
Population
Density
4.13
4.53
4.69
4.76
4.77
4.96
4.97
5.00
5.04
5.10
5.25
5.36
5.94
6.06
6.19
6.30
7.73
Mean
i
Deviates2
1
1.46410
2
0.65610
3
0.42250
4
0.33640
5
0.32490
6
0.14440
7
0.13690
8
0.11560
9
0.09000
10
0.05760
11
0.00810
12
0.00040
13
0.36000
14
0.51840
15
0.72250
16
0.92160
17
5.71210
Mean = 5.34 SS = 11.9916
n + 1 17 + 1
=
=9
2
2
T = (i 9) X 1
T = (1 9)4.13 + (2 9)4.53 + (3 9)4.69 + ...(17 9)7.73
T = 63.23
D=
63.23
(17 3 )(11.9916)
= 0.26050
DCritical = 0.2587,0.2860
Df = n = 17
If the calculated value falls within the critical range, accept Ho.
Since 0.2587 < D = 0.26050 < 0.2860 accept Ho.
The sample data set is not significantly different than normal (D0.26050, p > 0.05).
Statistic
q
2
D
W
D
Calculated Value
4.16
4.15
0.2605
0.8827
0.2007
Probability
> 0.05
0.5 > p > 0.1
> 0.05
0.035
0.067
Results
Normal
Normal
Normal
Non-normal
Normal
K-S
Prob
1.000
0.893
0.871
0.611
0.969
0.904
0.510
0.106
0.007
0.005
0.007
0.002