Biostatistics of HKU MMEDSC Session5handoutprint3
Biostatistics of HKU MMEDSC Session5handoutprint3
Biostatistics of HKU MMEDSC Session5handoutprint3
Hypothesis tests
CMED6100 – Session 5
ST Ali
16 October 2021
sli.do/#hkubiostat21
Outline
Announcements
Outline
Mid-term exam
Objectives
After the lecture, students should be able to:
Outline
Outline
Critical Value
𝐻! : 𝜇 = 𝜇! 𝐻" : 𝜇 = 𝜇" (> 𝜇! )
Pr (Type I Error) = 𝛼 (Level of significance)
Pr (Type II Error) = 𝛽 (1 − 𝛽 is Power of the test)
• We can compare our data to the kinds of data that we might expect if the
null hypothesis were true
0.15
difference between means, it would be very
unusual to observe large differences in
0.10
Outline
Part I
Accidents example
In a study of occupational health in a local factory, research was
conducted to investigate whether all employees faced similar risk of
various types of accident. A total of 117 accidents were classified
by the age of the employee and the type of accident:
Accident type
Age Sprain Burn Cut
Under 25 9 17 5
25 or over 61 13 12
Accidents example
Accident type
Age Sprain Burn Cut
Under 25 9 17 5 31
25 or over 61 13 12 86
70 30 17 117
Accident type
Age Sprain Burn Cut
Under 25 18.55 7.95 4.50 31
25 or over 51.45 22.05 12.50 86
70 30 17 117
For example, the expected number of sprains in employees aged under 25
is 31 × 70/117 = 18.55.
ST Ali CMED6100 – Session 5 Slide 13
Pearson’s χ2 test Fisher’s test McNemar’s test
χ2 test – example
We can calculate a test statistic based on the differences between the observed
and expected values, as follows:
Xn
(Oi − Ei )2
T =
i=1
Ei
(9 − 18.55)2 (17 − 7.95)2 (5 − 4.50)2
= + +
18.55 7.95 4.50
(61 − 51.45)2 (13 − 22.05)2 (12 − 12.50)2
+ + +
51.45 22.05 12.50
= 4.92 + 10.30 + 0.06 + 1.77 + 3.71 + 0.02
= 20.78
χ2 test – example
0.5
0.4 χ22
0.3
Density
0.2
0.1
0.0 ●
0 5 10 15 20 25
2
This test statistic can be compared with a χ (chi-squared) distribution with 2
degrees of freedom, and we find that it corresponds to a p-value 0.001.
There is strong evidence against the null hypothesis.
Degrees of freedom
In this case the expected totals in some cells are below 5 (e.g.
9 × 9/19 = 4.3).
Fisher’s test examines all the possible tables with the same marginal
totals and calculates how probable is the current table (or more extreme
tables) assuming independence.
ST Ali CMED6100 – Session 5 Slide 19
Pearson’s χ2 test Fisher’s test McNemar’s test
0 9 1 8 2 7
10 0 9 1 8 2
p1 = 0.00001 p2 = 0.00097 p3 = 0.01754
3 6 4 5 5 4
7 3 6 4 5 5
p4 = 0.10912 p5 = 0.28643 p6 = 0.34372
6 3 7 2 8 1
4 6 3 7 2 8
p7 = 0.19095 p8 = 0.04676 p9 = 0.00438
9 0
1 9
p10 = 0.00011
p1 + p2 + p3 + · · · + p10 = 1.
ST Ali CMED6100 – Session 5 Slide 20
Matched samples
Treatment Y Total
Cured Not cured
Treatment X Cured 212 144 356
Not cured 256 707 963
Total 468 851 1319
Matched samples
Treatment Y Total
Cured Not cured
Treatment X Cured 212 144 356
Not cured 256 707 963
Total 468 851 1319
Our null hypothesis is that the proportion cured by treatment X is the same as
the proportion cured by treatment Y.
McNemar’s test
The null hypothesis is that A+B=A+C. Or we could flip the problem and
consider the null hypothesis that C+D=B+D. In fact both of these null
hypotheses can be rephrased more simply as B=C.
McNemar’s test
The test statistic is very easy to calculate, and depends only on the
two off-diagonal cells.
(B−C )2
The test statistic is B+C . It follows a chi-squared distribution
with 1 degree of freedom.
ST Ali CMED6100 – Session 5 Slide 26
Pearson’s χ2 test Fisher’s test McNemar’s test
Treatment Y Total
Cured Not cured
Treatment X Cured 212 144 356
Not cured 256 707 963
Total 468 851 1319
(144−256)2
The test statistic is 144+256
= 31.4 which is highly significant (the 95%
percentile is 3.84 for a chi-squared distribution with 1 degree of freedom under
the null hypothesis).
Summary
Part II
Example
Randomized trial of dietary intervention on 32 patients. The
weight in kg of all patients was measured at baseline and then
after 2 weeks. Is the intervention effective?
120
Weight
80
40
Weight
0
difference
−2
−4
−6
−8
Control Intervention
(3) (2)
Diet data
Students’ T-test
• The z-test illustrated in the previous session is valid in large samples,
when the Central Limit Theorem ensures that the (sampling distribution
of the) sample mean will follow a Normal distribution.
• This test relies on a theoretical result that when sampling from a Normal
distribution, the sample mean will follow a t distribution
• The test was originally derived and published by William Gosset in 1908
The T distribution
0.4
0.3
Density
0.2
0.1
0.0
−4 −2 0 2 4
X
Figure: The t-distribution has ‘fatter’ tails than the Normal distribution,
but converges to Normal with more degrees of freedom
ST Ali CMED6100 – Session 5 Slide 37
0.20
Density
0.15
0.10
0.05
0.00
−8 −6 −4 −2 0 2
Weight difference
Paired t-test
We could look at the weight loss in the intervention group to test
whether there was any change from baseline. This is equivalent to
testing H0 : ∆t = 0 in the intervention group. This is a type of
“one sample t-test” since we are evaluating whether our sample
mean is the same as or different from a null value.
• Can derive p-value < 0.001 and reject the null hypothesis.
ST Ali CMED6100 – Session 5 Slide 40
0.4
0.3
Density
0.2 ~t15
0.1
0.0 ●
critical value tc
−4 −2 0 2 4
T
• The paired t-test is used on paired data, to test the null hypothesis that
Diet data
Control Intervention
0.05 0.05
mean=82.6 mean=78.3
SD=10.7 SD=14.0
0.04 0.04
0.03 0.03
Density
0.02 0.02
0.01 0.01
0.00 0.00
Diet data
●●
●●
●●
● ●
70 ●●
●
50
50 70 90 110 130
Baseline weight (kg)
Figure: High correlation between baseline and post-test weights, for two
ST Ali
groups respectively. CMED6100 – Session 5 Slide 51
4
Weight
difference 0
●
(kg)
● ●
−4 ● ●
● ●
● ●
● ●
● ●
● ●
●
−8
−12
50 70 90 110 130
Baseline weight (kg)
Control Intervention
0.4 0.4
0.3 0.3
Density
0.2 0.2
0.1 0.1
0.0 0.0
−8 −6 −4 −2 0 2 −8 −6 −4 −2 0 2
• If the two sample distributions are quite skewed, we might consider a log
transformation before doing a t-test.
• If the two samples have quite different variances then the t-test may not
give an appropriate p-value.
• If the two samples have the same (or similar) variance but do not follow a
Normal distribution, we can consider an alternative non-parametric test,
such as the Mann-Whitney U (also called the Wilcoxon Rank-Sum Test).
A A B A A B B A A B B B
135 141 142 143 149 158 170 171 172 189 254 289
Example – reaction times (ms) for patients without (A) and with (B)
anaesthetic. Subjects had to react on a simple visual stimulus.
Step 2: Inspect each ‘B’ sample in turn and count the number of ‘A’s
which come before it. Add up the total to get a U value.
A A B A A B B A A B B B
135 141 142 143 149 158 170 171 172 189 254 289
2 4 4 6 6 6
Total, UA = 28.
A A B A A B B A A B B B
135 141 142 143 149 158 170 171 172 189 254 289
0 0 1 1 3 3
Total, UB = 8.
ST Ali CMED6100 – Session 5 Slide 59
The t-test Paired t-test Signed-Rank test Two-sample t-test Mann-Whitney U
• The situation is a bit more complicated if there are tied values, and
there is a quicker way to calculate U in larger datasets based on the
ranks of the values.
ST Ali CMED6100 – Session 5 Slide 60
Summary
• Comparing post-test data did not show any significant
intervention effect.
• Comparing differences between post-test and baseline allowed
us to detect the intervention effect.
– This is a common occurrence when baseline and post-test
measurements are highly correlated.
• In an example where the normal distribution assumption was
appropriate, the (parametric) t-test led to smaller p-values
than the (non-parametric) Mann-Whitney U test.
– This is a common occurrence when the assumptions of
parametric tests are met.
ST Ali CMED6100 – Session 5 Slide 64
Summary of t-tests
Part III
Background Example
Background
We use ANOVA:
Background Example
Assumptions of ANOVA
Illustration
Small within-group
variation
→ Significant difference?
Large within-group
variation
→ non-significant
difference?
Background Example
• We break down the total variation (ST ) in the data into two
parts:
– Between-group variation (SA );
– Remaining ‘error’ (within-group variation) (SE ).
Background Example
Example data
Background Example
Drug
Block 1 2 3
1 12 10 14 15 23 21
2 17 14 17 13 26 24
3 26 21 28 29 31 35
4 13 16 13 12 16 18
mean 16.13 17.63 24.25
Background Example
●
30
Reaction ●
●
time
● ●
25
●
●
● ●
20
●
● ●
● ●
15 ●
● ●
● ●●
● ●
10 ●
1 2 3
Drug
ST Ali CMED6100 – Session 5 Slide 74
Background Example
Box-and-whisker plot
40
30
Reaction
time
20
10
1 2 3
Drug
ST Ali CMED6100 – Session 5 Slide 75
Background Example
Background Example
Format of data/notation
Drug
Block 1 2 3
1 ··· ···
2 Xi
3
4
sum (Ti ) T1 = 129 T2 = 141 T3 = 194
Calculations
First, calculate the total sum of squares.
X 2 T2
ST = Xi −
N
4642
= 122 + 102 + · · · + 182 −
24
= 10076 − 8970.667
= 1105.333
= 299.083
ST Ali CMED6100 – Session 5 Slide 78
Background Example
ANOVA table
You can see the 299.083 and the 1105.333 that we calculated.
What do other values represent? ...
Background Example
Explanation of table
Calculation of p-value
Part IV
Review
arious types
comparing
a 8 Baseline
Photoinhibition
b 80 Fluoxetine group
Vehicle group
c Pre-operative
Post-operative
d
4
mparing effect *
Review