Lecture Notes 1
Lecture Notes 1
Lecture Notes 1
DEPARTMENT OF STATISTICS
STA 403:
STATISTICAL METHODS II
delivered by
5
HYPOTHESES AND TEST PROCEDURES
What is Hypothesis?
A statistical hypothesis is a statement or assertion which may or may not be
true about the value of a population parameter.
Example
The mean age of Master of Public Health students in the Department
of Physician Assistantship equals 24 years. That is, 24
.
6
Types of Hypotheses
8
Simple and Composite Tests
When both null and alternative hypotheses are composite and represent
one side of the parameter space around some value 0 , then the test is
said to be a one-sided test. One-sided tests are also called one-tailed
tests.
Example
H 0 : 0 against H 1 : 0
10
When the null is simple and the alternative hypothesis represents the rest
of the parameter space , then the test is said to be a two-sided test.
Two-sided tests are also called two-tailed tests.
Example
H 0 : 0 againstH 1 : 0
11
Errors in Hypothesis Testing
The purpose of hypothesis testing is to determine whether the evidence on
the basis of available data tends to refute H 0 . Since H 0 can either be true
or false and at the end of the experiment we can reject or fail to reject H 0 ,
there are four possible decisions that we can make.
Hypothesis Testing Decisions
1. Reject H 0 when it is true (wrong decision – Type I error).
H0
2. Reject when it is false (correct decision).
H0
3. Fail to reject when it is true (correct decision).
H0
4. Fail to reject when it is false (wrong decision – Type II error).
12
Decision table for hypothesis testing
H 0 is true H 0 is false
Reject H Type I error correct decision
0
Fail to reject H 0 correct decision Type II error
Hypothesis testing involves the use of sample data to decide whether the
null hypothesis should be rejected or not. The decision to reject the null
hypothesis or not is based on the value of a test statistic. A test statistic is
an estimator whose value is calculated from sample data. Its distribution is
known under the assumption that the null hypothesis is true. 13
If there is a large difference between what is expected under the null
hypothesis and what is observed in a sample, then the null hypothesis is
rejected; and the result is said to be statistically significant. If, on the
other hand, the difference between what is expected and what is observed
is small, then there is not enough evidence to reject the null hypothesis;
and the result is said to be not statistically significant.
14
There are two approaches to determining whether to reject the null
hypothesis or not. One involves the determination of the rejection or
critical region of the test. The rejection or critical region is a set of values
of the test statistic that will enable us to reject H 0 . It is obtained by
using a pre-determined level of significance (or size of the test). The level
of significance, denoted by , is the probability of committing a Type I
error. The levels of significance often used in literature include
1% (or 0.01), 5% (or 0.05) and 10%(or 0.10).
15
The second approach involves calculation of the p-value of the test. The p-
value of the test is the probability of observing the test statistic at least as
extreme as observed under the null hypothesis. The null hypothesis is
p 0.05
rejected for “small” p-values (usually for ). Generally, the null
hypothesis is rejected at the level of significance if p . For values
of p , there is not enough evidence to reject the null hypothesis.
We shall limit ourselves to the first approach in this module.
16
In general, hypothesis testing in statistics involves the following steps:
Step 1: State the hypothesis that is to be questioned ( H 0 ).
Step 2: State an alternative hypothesis which will be accepted if the null
hypothesis is rejected ( H 1).
Step 3: Select the decision rule about when to reject H 0 and when to fail to
reject it.
Step 4: Evaluate the appropriate test statistic using sample data from the
population of interest.
Step 5: Carry out your decision.
17
TESTS CONCERNING A SINGLE POPULATION MEAN
Suppose that we wish to test the null hypothesis that the mean of a
2
normal population with variance
equals a specific 0value, .
That is, if we wish to test the null hypothesis H 0 against any of the three
alternatives H 1 : 0 or H 1 : 0 or H 1 : 0 ; then we
need to perform one of the tests in the table below, based on a random
sample of size n from this population.
18
Null hypothesis H 0 : 0 against various alternatives
19
Critical regions for testing H 0 are shown below.
H1 Reject H 0 if
0 z z (lower-tailed test)
0 z z (upper-tailed test)
0 z z or z z (two-tailed test)
2 2
20
0
z is the value of z that leaves z is the value of z that leaves
a value of to its left. a value of to its right.
sample size.
22
Test for means from a single normal population with a known
If the population we are sampling is normal and is known, then the test
statistics is given by
x 0
z
n
23
Example
24
Solution
(a) The null and alternative hypotheses are:
H 0 : 200
H 1 : 200
Since z cal 4.8 is greater than z0.025 1.96, we reject H 0 and agree
with the lecturers that the performance of the students have changed.
26
Test for means from a single population with unknown but
large sample
size
27
Example
A random sample of size n 100 observations taken from a population
with mean yielded the sample mean x 18.9 and sample standard
deviation of s 12.6 . If the hypotheses are
H 0 : 16 and H1 : 16,
(a) Calculate the value of the appropriate test statistics for this test;
(b) Hence determine whether H 0 should be rejected at the 1% level of
significance.
28
Solution
29
(b) Now we have z 2 z 0.01 2 z 0.005. From the z-tables,
z 0.005 2.575. z cal 2.30
2.575
Now since is neither greater
than 2.575 nor less than , we cannot reject the null
hypothesis at the 1% level of significance.
30
Test for means from a single population with unknown but
sample size is small
31
The critical regions for such tests are shown below.
H1 Reject H 0if
0 t t
0 t t
t t or t t
0 2 2
32
Example
The manufacturer of a new fiberglass tire claims that its average life will
be at least 40,000 miles. To verify this claim, a sample of 12 tires is
selected, with their lifetimes (in 1000s of miles) as follows:
Tire 1 2 3 4 5 6 7
Life 36.1 40.2 33.8 38.5 42.0 35.8 37.0
Tire 8 9 10 11 12
Life 41.0 36.8 37.2 33.0 36.0
35
TEST CONCERNING A POPULATION PROPORTION
(LARGE SAMPLE)
H1 Reject H 0 if
H1 : p p0 z z
H1 : p p0 z z
H1 : p p0 z z or z z
2 2
37
Example
An oil company claims that less than 20% of all car owners have not tried its
gasoline. Test the claim at the 0.01 level of significance, if a random check
reveals that 22 out of 200 car owners have not tried the company’s gasoline.
Solution
We wish to test H 0 : p 0.20 against H 1 : p 0.20.
Here, p0 0.20, the number of successes, x 22, and the
sample size is n 200.
38
Thus, 22
p̂ 0.11 .
200
39
From the z-table, z 0.01 2.33.Therefore the rejection region is
z 2.33. z 3.1802 2.33,
40
TEST CONCERNING A SINGLE POPULATION
PROPORTION (SMALL SAMPLES)
where,
k
x is the number of observed successes;
k
Biskthe p0 integer
; n,largest ; for which
k 0
41
and Bk ; n , p0 is
the probability of observing k successes in n binomial
p p0 .
trials when
42
Similarly, if the alternative hypothesis was H1 : p p0 ,the
x k ,
critical region would be
2
where
k
is the largest
k
integer for which
2
Bk ; n, p0
2
;
k 0 2
*
k
and is the smallest integer for which
2 b
Bk ; n, p0 .
k k * 2
43
Example
It is claimed that 40% of patients that attend a certain clinic on
any day are smokers. Suppose that on a particular day, 3 out of
a sample of 13 patients attending the clinic were found smokers.
Test the hypothesis H 0 : p 0.40 against H1 : p 0.4
at the 5% significant level.
44
Solution
In this problem, x 3 and n 13Since
. 0.05, k k0.025 .
2
Now, from binomial tables,
1
Bk ;13,0.40 B0;13,0.40 B1;13,0.40
k 0
0.0013 0.0113
0.0126
1
Thus, Bk ;13,0.40 0.0126 0.025implying
, that the largest
k 0 1
integer k for which Bk ;13,0.40 0.025 is 1.
k 0
2
45
*
Similarly, the smallest integer k for which
13 2
Bk ;13,0.40 0.025 is 10.
k 10
That is 13
Bk ;13,0.40 B10;13,0.40 B13;13,0.40
k 10
0.0065 0.0012 0.0001 0.0000
0.0078
To be able to reject the null hypothesis, either the number of
successes, x, is less than or equal to 1; or greater than or equal
to 10.
46
Since x 3 is not less or equal to 1, nor greater or equal to 10,
that 40% of patients that attend clinic on any day are smokers.
47
TEST CONCERNING A SINGLE
POPULATION VARIANCE
2 2
Supposed that we wish to test the null hypothesis H 0 : 0 against
2
0
where 2
has (n 1) degrees of freedom. 48
The critical regions for testing H 0 : 2 02 are shown below
H1 Reject H 0if
2 2
2
0
2
1
2 2
2 2
0
2 2 2 2
2
0
2 or
1
2 2
49
2 2 2
Given that n 25, s 9 and 10 testH 0 : 10 against
the H1 : 2 10
Solution n 25 and s 2 9,
gives: n 1s 2
25 19
2
21.6
2
0 10
50
The critical region is less than , or greater than
2 2
0.995 0.005 . From
0.005
2
for (with 24 df.) is 45.558.
Since 21.is
2
6 neither less than 9.886 nor greater than 45.558, we
51
TEST CONCERNING TWO POPULATION MEANS
(INDEPENDENT SAMPLES)
1 and 2 ; H :
We can compare by testing 0 1 2 , where
H 1 : 1 2
: 1
isHa1given 2 against
constant, : of
H 1any thealternatives
1 2
or or
under the following three conditions: 52
1. Large independent samples with known and
2 2
1 2.
53
Large independent samples with 1
2
and 2
2 known
The test statistic for testing two population means from independent samples
with 1
2
and 2
2is given by
x1 x2
z
2
2
1
2
n1 n2
1 2 z
where and is the usual standard normal random variable.
54
Example
A random sample of 100 observations is drawn from a normal population
with variance 16 and the sample mean was found to be 10.8. Another
sample of 64 observations is drawn from a second and independent
normal population with variance 25 and the sampling mean was found to
be 9.6. Test the hypotheses:
H 0 : the population means are equal
against
H1 : the population means are not equal.
55
Solution
The hypotheses above are equivalent to
H 0 : 1 2 0
H1 : 1 2 0
We now evaluate the test statistic by substituting
n1 100, x1 10.8, 2
1 16, n2 64, x2 10.8, 2
2 25
and 0 into the formula.
56
This gives
x1 x 2
z
1
2
2
2
n1 n2
10.8 9.6 0
16 25
100 64
1.6260
equal.
58
Large independent samples with 1
2
and 2
2 unknown
The test statistic for testing two population means from independent
samples with 1
2
and 2
is2given by
x1 x2
z 2 2 ,
s1 s2
n1 n2
2
1
2
where s and s are the respective sample estimates for 1
2
2
and 2
2 .
59
Example
60
Solution
region as z 1.645
Since . z 10
is greater than 1.645, we reject the null
1 2
hypothesis and conclude that is greater than .
62
Small independent samples with and unknown
2 2
1 2
common variance
1 and 2
2 2
Under Assumption 1, the appropriate test statistics for such test is given by
x1 x2
t ,
1 1
sp
n1 n2
sp ,
where called the pooled sample variance is given by
n1 1s1 n2 1s2
2 2
sp ,
n1 n2 2
2 2
s and s
1 2
with the
n1 respective
n2 2 variances for samples 1 and 2, and t has the t-
distribution with degrees of freedom. 64
Example
n1 16 and n2 10
Two independent random samples of sizes from
normal populations with unknown standard deviations have means
x1 23.4 and x2 18.2 ,
with corresponding standard deviations
s1 3.5 s 2 4 .8
and .
H 0 : 1 2 0 against H1 : 1 2 0
Test at the 10%
significance level, assuming that the population variances are equal.
65
Solution
s
We first evaluate p by substituting n1 16, s1 3.5, n2 10 and
s2 4.8 into the formula to give
66
Now substituting
s p 4.04, n1 16, x1 23.4 , n2 10, x2 18.2 and 0
67
From the t-tables, t0.10 with 24 degrees of freedom is 1.318. Thus the
critical region is t 1.318 . Since t 3.193 1.318, we reject H 0 and
conclude that 1 2 .
68
Under Assumption 2, appropriate test statistics for such test is given by
* x1 x2
t 2 2
s1 s2
n1 n2
*
where t is approximately t-distribution with df, v, given by
2
s 2
s 2
n n number.
1 2
n1 1 n2 1 69
Example
Two independent random samples of sizes n1 16 and n2 10 from
normal populations with unknown standard deviations have means
x1 23.4 and x2 18.2 ,
with corresponding standard deviations
s1 3.5 s 2 4 .8
and .
H 0 : 1 2 0 against H1 : 1 2 0
Test at the 10%
significance level, assuming that the population variances are different.
70
*
Evaluate t by substituting n1 16, s1 3.5, x1 23.4, n2 10, s2 4.8,
x2 18.2 and 0 into the test statistic to gives
* x1 x2
t
s12 s22
n1 n2
23.4 18.2 0
3.5 2
4.8 2
16 10
2.9680
We evaluate v by substituting n1 16, s1 3.5, n2 10, s2to obtain
4.8
71
2
s2
s 2
n n
1 2
v 1 2
2 2 2 2
s1 s2
n n
1 2
n1 1 n2 1
2
3.5
2
4 .8
2
16 10
2 2 2 2
3.5 4.8
16 10
16 1 10 1
15.7
15
72
From the t-tables, t0.10 with 15 degrees of freedom is 1.341. Thus the
critical region is t 1.341. Since t 2.9680 1.341, we reject H 0
* *
73
TEST CONCERNING TWO POPULATION MEAN
(PAIRED DATA)
74
Consider the test of H 0 : 1 2 the various alternatives
against
H1 : d H1 : d H1 : d
Let dbe the mean of the normally distributed population of paired
differences, d and sd be the mean and standard deviation of a
sample of n paired differences that have been selected.
76
Then the appropriate test statistic for conducting any of the test in the
table above is given by
d
t
sd n
n 1
where t has the t-distribution with degrees of freedom.
77
Example
The data below are the weights before and after ten boxers were fed with
a weight reducing diet:
i 1 2 3 4 5 6 7 8 9 10
Before, xi 69 50 61 72 78 66 75 89 86 54
After, yi 66 49 63 70 71 65 75 88 87 51
78
Solution
i 1 2 3 4 5 6 7 8 9 10
xi 69 50 61 72 78 66 75 89 86 54
yi 66 49 63 70 71 65 75 88 87 51
yi xi -3 -1 2 -2 -7 -1 0 -1 1 -3
80
TEST CONCERNING TWO POPULATION PROPORTIONS
(INDEPENDENT SAMPLES)
82
Condition 2
83
Example (a)
84
Solution
Since 0we
, first find the combined sample proportion as follows:
x1 x2
ˆ
p
n1 n2
18 15
35 42
0.4286
Therefore substituting
18 15
p̂1 0.5143, p̂2 0.3571, p̂ 0.4286 and 0
35 42
87
Solution
Since 0 , we substitute
18 15
ˆ1
p ˆ2
p
35 42
0.5143, 0.3571,
n1 35, n2 42 and 0.15 into the
test statistic gives:
pˆ1 pˆ2
z
ˆ 1 1 p
p ˆ1 p ˆ 2 1 p
ˆ2
n1 1 n2 1
0.5143 0.3571 0.15
0.51430.4857 0.35710.6429
35 1 42 1
2.6995
88
Fromthe z-tables z0.05 1.645, resulting in a critical region of z 1.645.
Since z 2.6995 1.645, H0
we reject and conclude that
p1 p2 0.15.
89
TEST CONCERNING TWO POPULATION VARIANCES
(INDEPENDENT SAMPLES)
90
The appropriate
2
test statistics is given by
s1
F 2,
s2
2 2
s
where 1 and s 2 are the sample variances.
91
The critical regions for testing H 0 : 12 22as shown:
are
H1 Reject H 0 if
H1 :
2
1
2
2
F F1 n1 1, n2 1
H1 :
2 2 F F n1 1, n2 1
1 2
F F n1 1, n2 1 or
1
H1 :
2
1
2
2
2
F F n1 1, n2 1
2
92
You may find the following identity useful.
1
F1 n1 1, n2 1
F n2 1,n1 1
Example
Suppose that observations from two independent random samples from
two normal populations yielded the following result:
2 2
n1 11, s1 18.4 , n2 16 and s2 13.5
againstH1 :
2 2
Test the null hypothesis H 0 :
2 2
1 2 1 2 at the 10%
significance level.
93
Solution
2 2
Substituting 1 s 18 . 4 and s 2 13 .5the test statistic gives:
into
s12
F 2
s2
18.4
13.5
1.363.
95
TEST ON CATEGORICAL DATA
96
THE MULTINOMIAL DISTRIBUTION
Multinomial distribution is an extension of the binomial
distribution. Its properties are as follows:
1. The experiment consist of n identical trials.
2. There are k possible outcomes associated with
each trial.
3. The probabilities of the k outcomes denoted by p1 , p2 ,, pk ,
remain constant from trial to trial; and p1 p2 pk 1.
4. The n trials are independent of each other.
5. The random variable of interest are the counts
in each of the k cells
97
Example
The table below show the market share for different brands of
television.
Brand of TV Market share
LG 20%
Samsung 30%
Panasonic 35%
Sony 15%
98
Solution
It is clear that the brands of television are independent of each
other. The TV brands LG, Samsung, Panasonic and Sony have
p1 0.20, p2 0.30, p3 0.35
distribution probabilities
and p4 0.15,
respectively. So we have
0.20 0.30 0.35 0.15 1
and therefore, the distribution of brand of television sets follows a
multinomial distribution.
99
GOODNESS OF FIT TESTS (WHEN CATEGORICAL
PROBABILITIES ARE COMPLETELY DEFINE)
100
Then the test statistics is given by
k oi ei
2
2
,
i 1 ei
where
k denotes the number of classes
oi denotes the number of observations in class i
ei denotes the number of expected observations
in class i ei npi
pi , the probability of observing an observation in class i
n denotes the sample size
101
The test statistic has an approximate chi-square distribution with
k 1degrees of freedom.
NOTE:
The approximation is good if the sample size is large enough so
that , for every cell, the expected cell frequency is 5 or more.
102
Example
The head teacher of a primary school is interested in knowing whether
there exist colour preferences among the pupils in his school. A sample of
100 pupils were drawn from the school and shown identically shaped
objects, coloured red, blue, yellow, green or pink. When each child was
asked to pick the most preferred colour, 30 picked red, 18 blue, 12 yellow,
20 green and 20 pink. Test, at 5% significance level, the hypotheses:
H 0 : there does not exist colour preferences
against
H 1 : colour preference does exist
103
Solution
104
Colour Red Blue Yellow Green Pink
Observed, oi 30 18 12 20 20
Expected, ei 20 20 20 20 20
oi ei 10 -2 -8 0 0
Thus
5 oi ei 2
2
i 1 ei
10 2 8 02 02
2 2 2
20 20 20 20 20
8.4
105
At the 5% significance level from the chi-square tables,
2
0.05 at df 5 1 4 is 9.49Therefore,
. the critical region
is 9.49.
2
Since 8.4 2 2
0.05 4 9we
.49,
cannot
reject H 0 . That is, we do not have enough evidence against the
null hypothesis. Therefore we conclude that there does not exist
colour preferences among the pupils
106
GOODNESS-OF-FIT TESTS FOR THE POISSON,
BINOMIAL AND NORMAL DISTRIBUTION
The goodness-of-fit tests can be applied to test sample data set as
coming from a population having a Poisson, or binomial or
normal distribution. The test statistic is given by
k oi ei
2
2
,
i 1 ei
107
where
k denotes the number of classes
oi denotes the number of observations in class i
ei denotes the number of expected observations
in class i ei npi
pi denotes the probability of observing an observation in i
n denotes the sample size
108
The test statistic has an approximate chi-square distribution with
k m 1degrees of freedom, where m is the number of
independent parameters that have to be estimated from the
sample.
NB: The approximation is good if the sample size is large
enough so that , for every cell, the expected cell frequency is 5 or
more.
109
Example 1
The weekly number of power failures reported in a certain district
in 50 weeks is recorded as follows
Number of failures Number of Weeks
0 6
1 8
2 13
3 11
4 7
5 4
6 1
110
Determine whether the weekly number of power failures in the district
follows a Poisson distribution at the 5% significance level.
Solution
We wish to test the hypothesis
H0 :
the weekly no. of power failures follow a Poisson distribution
against
H1 :
the weekly no. of power failures does not follow a Poisson
distribution. 111
We first calculate the expected frequencies using Poison
probabilities given by;
i
e
pi , i 0 ,1,2 , ,6.
i!
where is the mean of the distribution.
x f fx
0 6 0
1 8 8
2 13 26
3 11 33
4 7 28
5 4 20
6 1 6
50 121
112
Therefore, the mean x is given by
fx 121
x 2.42 2.4
f 50
113
Number of Number of Poisson Expected
Failures Weeks Probabilities frequencies
i ni pi ei npi
0 6 0.091 4.55
1 8 0.218 10.90
2 13 0.261 13.05
3 11 0.209 10.45
4 7 0.125 6.25
5 4 0.060 3.00
6 1 0.024 1.20
114
From the table above, three of the frequencies (43%) are less than
5. to satisfy the condition, we merge the last three class as shown
in the table below
Number of Number of Poisson Expected
Failures Weeks Probabilities frequencies
i ni pi npi
0 6 0.091 4.55
1 8 0.218 10.90
2 13 0.261 13.05
3 11 0.209 10.45
4 12 0.209 10.45
115
The test statistic becomes
k oi ei
2
2
i 1 ei
6 4.55 2
8 10.90
2
12 10.45
2
4.55 10.90 10.45
0.4621 0.7716 0.2299
1.49
116
Since 1.49 is less than 7.815, we fail to reject the null
hypothesis and conclude that the weekly number of power
failures follows a Poisson distribution.
117
Example 2
Four identical six-sided dice, each with faces marked 1 to 6, are rolled
200 times. At each rolling, a record is made of the number of dice
whose score on the uppermost face are even. The result is shown below.
p x B n, p C x p 1 p
n x n x
We have
Thus,
119
p 0 B 4,0.5 C0 0.5 0.5 0.0625
4 0 4
121
k oi ei
2
2
i 1 ei
10 12.50 2
22 12.50 2
12.50 12.50
0.500 1.620 7.220
10.653
B conclude
Since 10.653 is greater than 9.488, we reject the null hypothesis and 4,0.5.
that the number of even scores is not approximately
122
Example 3
Three hundred marbled ducks in Quack town are weighed and the results are shown
in the following table.
Mass (g) Frequency
m 470 10
Set up the hypotheses and test, at the 10% significance level, whether the mass of
marbled duck can be modelled by a normal distribution with mean 520g and
standard deviation 30g.
123
Solution
H 0: Mass of the marbled ducks can be modelled by the normal distribution with
mean 520 and standard deviation 30.
against
H 1: Mass of the marbled ducks cannot be modelled by the normal distribution
with mean 520 and standard deviation 30.
125
M 520 570 520
Pr( M 570) Pr
30 30
Pr z 1.67
0.5000 0.4525
0.0475
127
We summarize the rest of the calculations as follows:
oi ei oi ei (oi ei ) 2 (oi ei )2 ei
10 14.25 -4.25 18.06 1.268
158 135.80 22.20 492.84 3.629
123 135.80 -12.80 163.84 1.206
9 14.25 -5.25 27.56 1.934
8.037
Thus,
2 8.037
128
From tables, k m 1 3 6.251. Since Cal 8.037 is greater
2 2 2
0.10
than 0.05 3 6.251, we reject H 0 and conclude that the mass of the marbled
2
ducks cannot be modelled by the normal distribution with mean 520 and standard
deviation 30.
129
GOODNESS-OF-FIT TESTS FOR HOMOGENEITY
130
This approach is considered appropriate when the following
conditions hold.
131
The test statistic is given by
2
p l nij E nij
2
,
i 1 j 1 E nij
where
E nij is the expected cell frequency for theij
th
cell.
nij
is the number of observations that fall into each cell
called observed cell.
p 1l 1 degrees of freedom
The test statistic has a chi-square distribution with 132
Example
In a study of television viewing habits of children, a
developmental psychologist selects a random sample of 300
primary school pupils, 100 boys and 200 girls. Each child is
asked which of the following television programmes they like
best: The Talented kids, or The Pulpit, or Maths and Science
Quiz. The results are shown below
133
Viewing Preferences
The Talented Kids The Pulpit Math and Science Quiz
Boys 50 30 20
Girls 50 80 70
134
Solution
Viewing Preferences
The Talented Kids The Pulpit Math & Science Quiz Totals
135
Proportion of boys who prefer Talented Kids equals proportion of girls who
H 01 prefer Talented Kids
Proportion of boys who prefer The Pulpit equals proportion of girls who
H 02
prefer The Pulpit
Proportion of boys who prefer Maths & Science Quiz equals proportion of
H 03 girls who prefer Maths & Science Quiz
against
i 1 j 1 E nij
50 33.33 2
70 60 2
33.33 60
8.3375 1.6667
19.3255
The degrees of freedom is given by
df p 1l 1
2 13 1 2
138
From the chi-square tables, the value of the chi-square at the
0.05 level of significance, with 2 degrees of freedom is 5.99.
Since 19.3255 is greater than 5.99, we reject the null
hypothesis and conclude that at least one of the null
hypothesis is false.
139
GOODNESS-OF-FIT TESTS FOR INDEPENDENCE
In these tests, the null hypothesis is such that the variables are
independent against the alternative that the variables are not
independent.
140
Data for the goodness-of-fit of independence is usually presented in
contingency table. The following table is an r c contingency table.
Variable 2
Variable 1 1 2 c Totals
n11 n12 n1c R1
1 n21 n22 n2 c R2
2
r nr1 nr 2 nrc Rr
C1 C2 Cc n
Totals
The test statistic for the goodness-of-fit test is given by
r c oi ei 2
2
,
i 1 j 1 ei
141
where
c denotes the number of columns
r denotes the number of rows
oi denotes the number of observed cell frequency
ei denotes the number of expected cell frequency
n denotes the grand total
Ri C j
ei
n
142
The test statistic follows a chi-square distribution with c 1r 1
degrees of freedom. We reject the null hypothesis if
cal
2
2 c 1r 1
143
Example
Test the hypothesis that size and colour are independent at 5% significance
level.
144
Solution
37 40 37 44 37 36
e21 12.33 e22 13.57 e23 11 .10
120 120 120
48 40 48 44 48 36
e31 16.00 e32 17.60 e33 14.40
120 120 120
146
The test statistic is given by
r c oi ei
2
2
i 1 j 1 ei
10 11 .67 13 12.83
2 2
10 14.402
11 .67 12.83 14.40
3.63
The degrees of freedom is df 3 13 1 4
Therefore from chi-square tables, 0.05 4 9.49.
2