Non Parametric Tests Unit 5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 21

GCE

GCSE

FURTHER MATHEMATICS
A2 UNIT 5: FURTHER STATISTICS B

Non-Parametric Tests
Further Mathematics A2 Unit 5: Further Statistics B

Non-Parametric Tests ....................................................................................................................... 2


Wilcoxon Signed-Rank Test (One-Sample) ................................................................................. 3
Summary of the Key Steps for the Wilcoxon Signed Rank Test (One-Sample):........... 5
Worked Example ............................................................................................................................ 6
Wilcoxon Signed-Rank Test (Paired Samples) .......................................................................... 7
Summary of the Key Steps for the Wilcoxon Signed Rank Test (Paired Samples): .... 7
Worked Example ............................................................................................................................ 8
Mann-Whitney Test ............................................................................................................................ 9
Summary of the Key Steps for the Mann-Whitney Test: ................................................... 11
Worked Example .......................................................................................................................... 12
Questions ........................................................................................................................................... 13
Question 1 Worked Solution ......................................................................................................... 15
Question 2 Worked Solution ......................................................................................................... 16
Question 3 Worked Solution ......................................................................................................... 17
Question 4 Worked Solution ......................................................................................................... 18
Question 5 Worked Solution ......................................................................................................... 19
Question 6 Worked Solution ......................................................................................................... 20
Further Mathematics A2 Unit 5: Further Statistics B

Non-Parametric Tests
Many of the statistical hypothesis tests you have previously seen have made use of a
specific model such as the binomial or normal distributions to describe the data
probabilistically. Assumptions about the underlying distribution of the population were made.

In this section, a more cautious approach is considered which, as much as possible, makes
few assumptions about the nature of the underlying population. When an underlying
distribution is assumed, the parameters of that distribution need to be considered. For
example, when a normal distribution is assumed, its mean and variance need to be assumed
or estimated.

However, by not imposing the assumption of an underlying distribution there are no


parameters to consider. Tests of this kind are called non-parametric tests. In this section,
two non-parametric tests shall be introduced, namely the Mann-Whitney test and the
Wilcoxon signed-rank test.

Specification Content
• Non-parametric tests: understand and use Mann-Whitney and Wilcoxon signed-rank
tests, understanding appropriate test selection and interpreting the results in context.
− Alternative tests for when a distributional model cannot be assumed.
− Excludes tied ranks.

Ranks
Non-parametric tests are based on the so-called rank order of the data. You would have
seen this previously when computing the Spearman's rank correlation coefficient, which is
itself the non-parametric version of the Pearson's product moment correlation coefficient.

To rank a dataset, the convention is to assign the smallest value in the dataset a rank of 1.
The next largest value is assigned a rank of 2, and so on to the largest value in the dataset.

Once the data is ranked, non-parametric tests use the ranks of the data rather than the data
itself. In using ranks instead of the actual data, we lose some information about the
magnitude of differences between observations.

For example, consider the following dataset: 4, 2, 8, 6, 7, 15, 3.

The smallest observation is 2, which is assigned a rank of 1. Then 3 is assigned a rank of 2.


This process is continued until reaching the largest observation, 15, which is given a rank of
7 (note there are seven values in the dataset). This means the dataset would be ranked as
follows:

Data 4 2 8 6 7 15 3
Rank 3 1 6 4 5 7 2

In the original dataset, the observation 15 was relatively large compared to the other
observations. However, this difference is not emphasised when we look at the ranks. Non-
parametric tests are often useful in the analysis of data that is highly skewed.

There are two non-parametric tests that will be introduced in this chapter. The first of these is
the Wilcoxon signed-rank test. The second is the Mann-Whitney test. Both tests are
analogous to parametric tests. It is assumed that the reader has access to the `GCE
Mathematics: Elementary Statistical Tables' provided by the WJEC (available on the WJEC
website).

2
Further Mathematics A2 Unit 5: Further Statistics B

Wilcoxon Signed-Rank Test (One-Sample)


The Wilcoxon Signed-Rank Test is used to test hypotheses concerning the median 𝜂 of a
continuous random variable - 𝑋. This test requires the underlying distribution to be
symmetric. Note that the test can only be used for numerical data.

Some authors use the letter 𝑚 to represent the population median. However, by convention,
the letter 𝑚 is used to denote an observed value of the random variable 𝑀 where 𝑀 is the
sample median. Instead, the Greek letter ‘eta’ 𝜂 shall be used to denote the population
median.

The null hypothesis is that the population median is equal to some value, denoted by 𝜂0 . We
can write this mathematically as 𝐻0 : 𝜂 = 𝜂0 . Assuming the null hypothesis is true, we expect
50% of the data to exceed the median and 50% to be less than the median.

There are three possible alternative hypotheses:


(1) 𝐻1 : 𝜂 < 𝜂0 , (2) 𝐻1 : 𝜂 > 𝜂0 , (3) 𝐻1 : 𝜂 ≠ 𝜂0

Note the similarities between this test and the test for a specified mean of a normal
distribution. The Wilcoxon Signed-Rank Test is, in essence, a non-parametric version of the
test for a specified mean of a normal distribution.

The first of the alternative hypotheses suggests that the specified median 𝜂0 is larger than
the true median. If this alternative hypothesis is true, the probability of being less than 𝜂0 is
greater than 0.5. This would be supported by significantly more observations below 𝜂0 than
above it.

The second alternative hypothesis suggests that the specified median 𝜂0 is less than the
true median. If this alternative hypothesis is true, the probability of being greater than 𝜂0 is
greater than 0.5. This would be supported by significantly more observations above 𝜂0 than
below it.

The third alternative hypothesis is two-tailed, and suggests that the specified median 𝜂0 is
incorrect. This would be supported by significantly more observations on one side of 𝜂0 than
the other.

Suppose we have a sample - 𝑋1 , 𝑋2 , … , 𝑋𝑛 . Let 𝐷𝑖 = 𝑋𝑖 − 𝜂0 be the difference between an


observation and the specified median under the null hypothesis. Let 𝑅1 , 𝑅2 , … , 𝑅𝑛 be the
ranks of the absolute value of the differences, i.e. the ranks of |𝐷1 |, |𝐷2 |, … , |𝐷𝑛 |. In other
words, we rank the differences without considering their sign.

Table 13 on page 19 of the statistical tables gives critical values of the Wilcoxon signed rank
statistic. The following information is given:

The table gives the upper tail critical values 𝑤𝑐 of the statistic,
𝑛

𝑊 = ∑ 𝑈𝑖 𝑅𝑖
𝑖=1

where 𝑅𝑖 denotes the rank of the magnitude of the 𝑖th observation in a sample of size 𝑛 and
𝑈𝑖 = 1 or 0 according as to whether this observation is positive or negative. The lower tail
1
critical values are given by 𝑛(𝑛 + 1) − 𝑤𝑐 . Since 𝑊 is discrete, exact significance levels in
2
general cannot be achieved. The critical values given are those whose significance levels
are nearest to those stated.

3
Further Mathematics A2 Unit 5: Further Statistics B

Here 𝑈𝑖 is a so-called indicator variable that takes the value 1 when an observation is
positive and 0 when the observation is negative. Informally, 𝑊 can be thought of as the sum
of the ranks for the positive differences.

Since there are 𝑛 members of the sample, each given a rank between 1 and 𝑛, the sum of
these ranks is the same as the sum of the first 𝑛 integers. Recall the sum of the first 𝑛
1
integers is 2 𝑛(𝑛 + 1). The tables give the upper tail critical values of the Wilcoxon signed
1
rank statistic, referred to as 𝑤𝑐 . Lower tail critical values are calculated using 2
𝑛(𝑛 + 1) −
𝑤𝑐 .

Note that the test statistic 𝑊 is discrete. It can only take integer values between 0 and
1
2
𝑛(𝑛 + 1). This means the critical values given in the statistical table do not necessarily
correspond to critical regions with the exact significance levels specified. Instead, the value
given is the integer that gives a critical region with significance level as close as possible to
the significance level specified in the table.

For example, consider the case 𝑛 = 8. Possible values of the Wilcoxon signed rank statistic
1
are all the integers from 0 to 𝑛(𝑛 + 1). When 𝑛 = 8, the possible values of 𝑊 are all the
2
integers from 0 to 36. Considering values of 27 and above, corresponding probabilities of the
Wilcoxon signed rank statistic are given in the table below.

𝑤𝑐 𝑃(𝑊 ≥ 𝑤𝑐 )
27 0.12500
28 0.09766
29 0.07422
30 0.05469
31 0.03906
32 0.02734
33 0.01953
34 0.01172
35 0.00781
36 0.00391

In the statistical tables, the critical value that gives a one-tail critical region with a
significance level of 10% is stated as 28. Referring to the table above, we look for the
probability that is closest to 10%. This occurs when 𝑤𝑐 = 28 with a corresponding probability
of 9.766%. Therefore, the critical region 𝑊 ≥ 28 has a significance level of 9.766% (just
below 10%).

Exercise: Use the table above to derive the other critical values given in the Wilcoxon
signed rank table for 𝑛 = 8.

Since the Wilcoxon signed rank statistic is symmetrical, a critical value of 28 for a one-tail
critical region with 10% significance corresponds to one of the boundaries for a two-tail
critical region with 20% significance. The lower tail critical value is:
1 1
2
𝑛(𝑛 + 1) − 𝑤𝑐 = 2 (8)(9) − 28 = 8.

The critical region for the two-tail test is all values of 𝑊 that are greater than or equal to 28,
or less than or equal to 8.

4
Further Mathematics A2 Unit 5: Further Statistics B

Let us revisit the various forms of the alternative hypotheses. The first alternative hypothesis
𝐻1 : 𝜂 < 𝜂0 , is supported by a significantly greater proportion of observations below 𝜂0 than
above it. In this case, values of the test statistic 𝑊 are likely to be small if 𝐻1 is true, since
we expect lots of observations to have negative ranks. The test statistic 𝑊 is compared with
1 1
the lower tail critical value 2 𝑛(𝑛 + 1) − 𝑤𝑐 . The critical region is 𝑊 ≤ 2 𝑛(𝑛 + 1) − 𝑤𝑐 .

The second alternative hypothesis 𝐻1 : 𝜂 > 𝜂0 , is supported by a significantly greater


proportion of observations above 𝜂0 than below it. In this case, values of the test statistic 𝑊
are likely to be large if 𝐻1 is true, since we expect lots of observations to have positive ranks.
The test statistic 𝑊 is compared with the upper tail critical value, 𝑤𝑐 . The critical region is
𝑊 ≥ 𝑤𝑐 .

The two-tailed alternative hypothesis 𝐻1 : 𝜂 ≠ 𝜂0 is supported when there is either a


significantly greater proportion of observations above or below 𝜂0 . In this case, values of the
test statistic could be large or small if 𝐻1 is true, since there could be lots of observations
with either positive or negative ranks. The upper tail critical value 𝑤𝑐 is obtained from tables,
1
and the lower tail critical value is 2 𝑛(𝑛 + 1) − 𝑤𝑐 . Therefore, the critical region is (𝑊 ≤
1
2
𝑛(𝑛 + 1) − 𝑤𝑐 ) ∪ (𝑊 ≥ 𝑤𝑐 ).

Note: The random variable 𝑋 from which the sample is drawn is required to be continuous.
Assuming 𝐻0 is true 𝑃(𝑋 > 𝜂0 ) = 𝑃(𝑋 < 𝜂0 ) = 0.5. Only values of 𝑋 that are greater than or
less than 𝜂0 need to be considered. Theoretically - 𝑃(𝑋 = 𝜂0 ) = 0. Any values in the sample
that are the same as the specified value 𝜂0 should be discarded.

Throughout this document, 𝑊 shall be used to refer to the random variable that is the
Wilcoxon signed rank statistic. Observed values of 𝑊 shall be denoted by 𝑤.

Summary of the Key Steps for the Wilcoxon Signed Rank Test (One-Sample):
1. Define 𝜂, the population median, and assume that the population distribution is
symmetrical.
2. State the null and alternative hypotheses in terms of 𝜂.
3. Calculate the differences 𝑋 − 𝜂0 , where 𝜂0 is the value of 𝜂 under 𝐻0 .
4. Remove any observations from the sample whose differences are 0.
5. Rank the absolute differences.
6. Calculate the test statistic, 𝑊, which is the sum of the ranks for the positive differences.
7. Obtain the upper tail critical value 𝑤𝑐 from tables, which depends on the sample size, the
significance level and the type of test used.
8. State the critical region, which depends on the form of the alternative hypothesis:
1
a. If 𝐻1 : 𝜂 < 𝜂0 , the critical region is 𝑊 ≤ 2 𝑛(𝑛 + 1) − 𝑤𝑐 .
b. If 𝐻1 : 𝜂 > 𝜂0 , the critical region is 𝑊 ≥ 𝑤𝑐 .
1
c. If 𝐻1 : 𝜂 ≠ 𝜂0 , the critical region is (𝑊 ≤ 𝑛(𝑛 + 1) − 𝑤𝑐 ) ∪ (𝑊 ≥ 𝑤𝑐 ).
2
9. Check whether the test statistic lies in the critical region.
10. State the conclusion in context.

5
Further Mathematics A2 Unit 5: Further Statistics B

Worked Example
The median lifetime of a certain brand of light bulb is claimed to be 600 hours. A manager
believes a batch of light bulbs is substandard. A random sample of 15 of these light bulbs is
taken and their lifetimes recorded as follows:

642 693 557 639 606 578 565 516 702 490 602 589 612
549 478.

Use a Wilcoxon signed-rank test, at the 5% significance level, to test whether the manager's
claim is justified.

Solution:
Let 𝜂 denote the population median lifetime of the brand of light bulbs and let 𝑋 denote the
lifetime of a light bulb. We assume that the distribution of 𝑋 is symmetrical. The null and
alternative hypotheses are:
𝐻0 : 𝜂 = 600
𝐻1 : 𝜂 < 600

We now create a table that contains the sample values, the differences between the sample
values and the hypothesised median under 𝐻0 , and the ranks of the absolute differences.

Lifetime, 𝑥 Difference, 𝑑 = 𝑥 − 𝜂0 Rank of |𝑑|


642 42 8
693 93 12
557 -43 9
639 39 7
606 6 2
578 -22 5
565 -35 6
516 -84 11
702 102 13
490 -110 14
602 2 1
589 -11 3
612 12 4
549 -51 10
478 -122 15

The observed value of the test statistic is 𝑤 = 8 + 12 + 7 + 2 + 13 + 1 + 4 = 47. This


test statistic is the sum of the positive ranks. If 𝐻1 is true, we expect this sum to be small.
The upper tail critical value 𝑤𝑐 from tables for 𝑛 = 15 using a one-tail test at the 5%
significance level is 𝑤𝑐 = 90. The lower tail critical value is:
1 1
𝑛(𝑛 + 1) − 𝑤𝑐 = (15)(16) − 90 = 30.
2 2

Hence, the critical region is 𝑊 ≤ 30.

Since 𝑤 ≰ 30, the test statistic does not lie in the critical region. Therefore, there is
insufficient evidence to reject 𝐻0 . This means the manager's claim is not supported at the
5% level of significance and there is insufficient evidence to conclude that the batch is
substandard.

6
Further Mathematics A2 Unit 5: Further Statistics B

Wilcoxon Signed-Rank Test (Paired Samples)


The Wilcoxon Signed-Rank Test can also be used to test hypotheses concerning the median
difference 𝜂𝐷 between two paired samples. Two measurements are made for each member
of a random sample of size 𝑛. Let 𝑋 (1) denote the values of the first measurement, with
population median 𝜂1 , and let 𝑋 (2) denote the values of the second measurement with
population median 𝜂2 . Then 𝜂𝐷 = 𝜂1 − 𝜂2 .

Assume that the differences between the measurements follows a symmetric distribution.
The null hypothesis is that the median difference is zero, i.e. 𝐻0 : 𝜂𝐷 = 0. Assuming the null
hypothesis is true, we expect 50% of the differences to be less than 0 and 50% to be more
than 0. The three possible alternative hypotheses are 𝐻1 : 𝜂𝐷 < 0, 𝐻1 : 𝜂𝐷 > 0 and 𝐻1 : 𝜂𝐷 ≠ 0.

(1) (1) (1) (2) (2) (2)


Suppose we have the paired samples 𝑋1 , 𝑋2 , … , 𝑋𝑛 and 𝑋1 , 𝑋2 , … , 𝑋𝑛 . Consider the
(1) (2)
differences 𝐷𝑖 = 𝑋𝑖 − 𝑋𝑖 between each pair of observations. Let 𝑅1 , 𝑅2 , … , 𝑅𝑛 be the ranks
of the absolute value of the differences, i.e. the ranks of |𝐷1 |, |𝐷2 |, … , |𝐷𝑛 |. In other words, we
rank the differences without considering their sign. As for the one-sample test, the test
statistic is the sum of the ranks of the positive differences, denoted by 𝑊.

The alternative hypothesis 𝐻1 : 𝜂𝐷 < 0 suggests that observations are smaller on average on
the first measurement compared to the second. In this case, values of the test statistic 𝑊 are
likely to be small if 𝐻1 is true. The test statistic 𝑊 is compared with the lower tail critical value
1 1
𝑛(𝑛 + 1) − 𝑤𝑐 . The critical region is 𝑊 ≤ 𝑛(𝑛 + 1) − 𝑤𝑐 .
2 2

The second alternative hypothesis 𝐻1 : 𝜂𝐷 > 0 suggests that observations are larger on
average on the first measurement compared to the second. In this case, values of the test
statistic 𝑊 are likely to be large if 𝐻1 is true. The test statistic 𝑊 is compared with the upper
tail critical value 𝑤𝑐 . The critical region is 𝑊 ≥ 𝑤𝑐 .

The two-tailed alternative hypothesis 𝐻1 : 𝜂𝐷 ≠ 0 suggests that observations on one of the


measurements are larger on average than the other measurement. In this case, values of
the test statistic could be large or small if 𝐻1 is true. The upper tail critical value 𝑤𝑐 is
1
obtained from tables, and the lower tail critical value is 𝑛(𝑛 + 1) − 𝑤𝑐 . Hence, the critical
2
1
region is given by(𝑊 ≤ 2
𝑛(𝑛 + 1) − 𝑤𝑐 ) ∪ (𝑊 ≥ 𝑤𝑐 ).

Note: Any sample members whose differences are 0 should be discarded.

Summary of the Key Steps for the Wilcoxon Signed Rank Test (Paired Samples):
1. Define 𝜂𝐷 the population median difference, and assume that the distribution of the
differences is symmetrical.
2. State the null and alternative hypotheses in terms of 𝜂𝐷 .
3. Calculate the differences 𝑋 (1) − 𝑋 (2) .
4. Remove any observations from the sample whose differences are 0.
5. Rank the absolute differences.
6. Calculate the test statistic 𝑊 which is the sum of the ranks for the positive differences.
7. Obtain the upper tail critical value 𝑤𝑐 from tables, which depends on the sample size, the
significance level and the type of test used.
8. State the critical region, which depends on the form of the alternative hypothesis:
1
a. If 𝐻1 : 𝜂𝐷 < 0, the critical region is 𝑊 ≤ 2 𝑛(𝑛 + 1) − 𝑤𝑐 .
b. If 𝐻1 : 𝜂𝐷 > 0, the critical region is 𝑊 ≥ 𝑤𝑐 .
1
c. If 𝐻1 : 𝜂𝐷 ≠ 0, the critical region is (𝑊 ≤ 𝑛(𝑛 + 1) − 𝑤𝑐 ) ∪ (𝑊 ≥ 𝑤𝑐 ).
2
9. Check whether the test statistic lies in the critical region.
10. State the conclusion in context.

7
Further Mathematics A2 Unit 5: Further Statistics B

Worked Example
To measure the effectiveness of a drug for migraine relief, 15 patients, all susceptible to
migraines, were each administered the drug at the onset of one migraine and the placebo at
the onset of another migraine. One hour after the onset of the migraine, a migraine index
was obtained for each patient with the following results:

Patient 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Index after Drug 28 21 17 18 31 12 39 24 18 39 19 14 20 23 16
Index after Placebo 32 33 23 26 34 17 30 24 29 23 21 24 21 30 31

Carry out an appropriate Wilcoxon signed rank test on this data set, using a 5% significance
level, to test whether the drug reduces the migraine index.

Solution:
Let 𝑋 (1) denote the migraine index after the drug is administered with population median 𝜂1 .
Let 𝑋 (2) denote the migraine index after taking the placebo with population median 𝜂2 . Let
(1) (2)
𝜂𝐷 = 𝜂1 − 𝜂2 . Assume that the distribution of differences 𝐷𝑖 = 𝑋𝑖 − 𝑋𝑖 is symmetrical. The
null and alternative hypotheses are:
𝐻0 : 𝜂𝐷 = 0
𝐻1 : 𝜂𝐷 < 0

We now create a table that contains the paired sample values, the differences between the
paired sample values and the ranks of the absolute differences.

Patient 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Index after Drug 28 21 17 18 31 12 39 24 18 39 19 14 20 23 16
Index after Placebo 32 33 23 26 34 17 30 24 29 23 21 24 21 30 31
Difference, 𝐷𝑖 -4 -12 -6 -8 -3 -5 9 0 -11 16 -2 -10 -1 -7 -15
Rank of |𝐷𝑖 | 4 12 6 8 3 5 9 - 11 14 2 10 1 7 13

There is one difference that is exactly 0. This observation is not considered, leaving fourteen
patients. The sample size that we are now working with is 14. The observed value of the test
statistic is 𝑤 = 9 + 14 = 23. This test statistic is the sum of the positive ranks. If 𝐻1 is true,
we expect this to be small. The upper tail critical value 𝑤𝑐 from tables for 𝑛 = 14 using a one-
tail test at the 5% significance level is 𝑤𝑐 = 79. The lower tail critical value is calculated as
1 1
𝑛(𝑛 + 1) − 𝑤𝑐 = (14)(15) − 79 = 105 − 79 = 26. Hence, the critical region is 𝑊 ≤ 26.
2 2

Since 𝑤 ≤ 26, the test statistic lies in the critical region. Therefore, there is sufficient
evidence to reject 𝐻0 and so sufficient evidence at the 5% significance level to conclude that,
on average, the drug reduces the migraine index compared to the placebo.

8
Further Mathematics A2 Unit 5: Further Statistics B

Mann-Whitney Test
The Mann-Whitney test is a non-parametric test that can be used to test for differences
between two independent data sets. It is analogous to the test for the difference in means of
two independent normal distributions. The test is used when it is not possible to make the
assumption that the underlying populations are normally distributed. As it is a non-parametric
test, the Mann-Whitney test considers the difference in medians between two independent
datasets.

The null hypothesis is that the medians are equal. Let 𝜂1 denote the median of the first
population, and let 𝜂2 denote the median of the second population. Then the null hypothesis
can be written mathematically as 𝐻0 : 𝜂1 = 𝜂2 .

Assuming the null hypothesis to be true, the probability that an observation from the second
population exceeds an observation from the first population is 0.5. There are three possible
alternative hypotheses: 𝐻1 : 𝜂1 < 𝜂2 , 𝐻1 : 𝜂1 > 𝜂2 , and 𝐻1 : 𝜂1 ≠ 𝜂2 .

The first of these alternative hypotheses suggests that the probability of an observation from
the second population exceeding an observation from the first population is greater than 0.5
and therefore values in the second sample are likely to be higher than the values in the first
sample.

The converse is true for the second alternative hypothesis, which suggests that the
probability of an observation from the second population exceeding an observation from the
first population is less than 0.5 and therefore the values in the second sample are likely to be
lower than the first sample.

The third alternative hypothesis is two-tailed, suggesting that the probability of an


observation from the second population exceeding an observation from the first population
could be greater or less than 0.5.

To test these hypotheses, we introduce the Mann-Whitney statistic 𝑈. Suppose we have two
independent samples, 𝑋1 , 𝑋2 , … , 𝑋𝑚 and 𝑌1 , 𝑌2 , … , 𝑌𝑛 . The size of the first sample is 𝑚 and the
size of the second sample is 𝑛.

Table 14 on pages 20 and 21 of the statistical tables give critical values of the Mann-Whitney
statistic. The following information is given:

The tables give the upper critical values 𝑢𝑐 of the statistic


𝑚 𝑛

𝑈 = ∑ ∑ 𝑍𝑖𝑗
𝑖=1 𝑗=1

where 𝑍𝑖𝑗 = 1 if 𝑋𝑖 < 𝑌𝑗 and 𝑍𝑖𝑗 = 0 if 𝑋𝑖 > 𝑌𝑗 given the independent samples 𝑋1 , 𝑋2 , … , 𝑋𝑚
and 𝑌1 , 𝑌2 , … , 𝑌𝑛 . The lower critical values are given by 𝑚𝑛 − 𝑢𝑐 .

Here, 𝑍𝑖𝑗 is a so-called indicator variable. It takes the value 1 if an observation from the first
sample is smaller than an observation in the second sample and takes a value of 0 if an
observation from the first sample is larger than an observation in the second sample. This
comparison is made for all possible pairs of observations. The case when an observation
from the first sample is equal to an observation from the second sample is not considered
here.

9
Further Mathematics A2 Unit 5: Further Statistics B

Practically, this statistic is calculated by considering each observation from the first sample
in turn and counting the number of observations from the second sample that are larger than
it. The total of these gives the value of 𝑈.

Since there are 𝑚 members of the first sample, and 𝑛 members of the second sample, there
are 𝑚𝑛 possible comparisons of an observation from the first sample with an observation
from the second sample. The tables give upper tail critical values of the Mann-Whitney
statistic, referred to as 𝑢𝑐 . Lower tail critical values are calculated using 𝑚𝑛 − 𝑢𝑐 .

There are four tables that give critical values for the Mann-Whitney test statistic. Each of the
four tables has a title corresponding to the type of test (one-tailed or two-tailed) and the
respective significance level that the table can be used for. The rows and columns
correspond to different values of 𝑚 and 𝑛, respectively.

Note that the test statistic 𝑈 is discrete. It can only take integer values from 0 to 𝑚𝑛. This
means the critical values given in the Mann-Whitney test statistic tables do not necessarily
give critical regions with the exact significance levels stated in the titles of the tables.
Instead, the value given is the integer that gives a critical region with significance level as
close as possible to the significance level specified in the table.

For example, consider the case 𝑚 = 4 and 𝑛 = 5. Possible values of the Mann-Whitney
statistic are all the integers from 0 to 20. Considering values of 15 and above, corresponding
probabilities of the Mann-Whitney statistic are given in the table below.

𝑢𝑐 𝑃(𝑈 ≥ 𝑢𝑐)
15 0.14286
16 0.09524
17 0.05556
18 0.03175
19 0.01587
20 0.00794

In the statistical tables, the critical value that gives a one-tail critical region with a
significance level of 5% is stated as 17. Referring to the table above, we look for the
probability that is closest to 5%. This occurs when 𝑢𝑐 = 17, with a corresponding probability
of 5.556%. Therefore, the critical region 𝑈 ≥ 17 has a significance level of 5.556% (just
above 5%).

Exercise: Use the table above to derive the critical values in the other three Mann-Whitney
tables for 𝑚 = 4 and 𝑛 = 5.

Since the Mann-Whitney statistic is symmetrical, a critical value of 17 for a one-tail critical
region with 5% significance corresponds to one of the boundaries for a two-tail critical region
with 10% significance. The lower tail critical value is 𝑚𝑛 − 𝑢𝑐 = (4)(5) − 17 = 3. The critical
region for the two-tail test is all values of 𝑈 that are greater than or equal to 17, or less than
or equal to 3.

Let us revisit the various forms of the alternative hypotheses. The first alternative hypothesis
𝐻1 : 𝜂1 < 𝜂2 is supported by a significantly greater proportion of observations in the second
sample being larger than observations in the first sample. In this case, values of the test
statistic 𝑈 are likely to be large if 𝐻1 is true. The test statistic 𝑈 is compared with the upper
tail critical value 𝑢𝑐 given in tables. The critical region is 𝑈 ≥ 𝑢𝑐 .

10
Further Mathematics A2 Unit 5: Further Statistics B

The second alternative hypothesis 𝐻1 : 𝜂1 > 𝜂2 is supported by a significantly greater


proportion of observations in the first sample being larger than observations in the second
sample. In this case, values of the test statistic 𝑈 are likely to be small if 𝐻1 is true. The test
statistic 𝑈 is compared with the lower tail critical value 𝑚𝑛 − 𝑢𝑐 . The critical region is given
by 𝑈 ≤ 𝑚𝑛 − 𝑢𝑐 .

The two-tailed alternative hypothesis 𝐻1 : 𝜂1 ≠ 𝜂2 is supported when there is a significantly


greater proportion of observations that are larger in one sample compared to the other. In
this case, values of the test statistic 𝑈 could be large or small if 𝐻1 is true. The upper tail
critical value 𝑢𝑐 is obtained from tables, and the lower tail critical value is 𝑚𝑛 − 𝑢𝑐 .
Therefore, the critical region is (𝑈 ≤ 𝑚𝑛 − 𝑢𝑐 ) ∪ (𝑈 ≥ 𝑢𝑐 ).

Note: Throughout this document, 𝑈 shall be used to refer to the random variable that is the
Mann-Whitney statistic. Observed values of 𝑈 shall be denoted by 𝑢.

An alternative method for calculating 𝑈 is to rank the observations from both samples
collectively and then compute the sum of the ranks of the second sample. Denote the sum of
𝑛(𝑛+1)
the ranks of the second sample by 𝑇. Then the value of 𝑈 is computed as 𝑈 = 𝑇 − 2
(recall that 𝑛 is the size of the second sample).

Even if the above method is not used, ranking the observations can make it easier to count
the number of observations in the second sample that are larger than each observation in
the first sample.

Summary of the Key Steps for the Mann-Whitney Test:


1. Define 𝜂1 and 𝜂2 , the population medians for the two variables 𝑋 and 𝑌, and assume that
the variances of 𝑋 and 𝑌 are the same.
2. State the null and alternative hypotheses in terms of 𝜂1 and 𝜂2 .
3. Put each sample in ascending order (and rank both samples collectively if using the sum
of ranks approach).
4. Calculate the test statistic, 𝑈, which is the sum of the number of observations in the
second sample that are greater than each of the observations in the first sample.
Alternatively, let 𝑇 denote the sum of the ranks for the second sample, then the test
𝑛(𝑛+1)
statistic is given by 𝑈 = 𝑇 − 2 .
5. Obtain the upper-tail critical value 𝑢𝑐 from tables, which depends on the values of 𝑚 and
𝑛, the significance level and the type of test.
6. State the critical region, which depends on the form of the alternative hypothesis:
a. If 𝐻1 : 𝜂1 < 𝜂2 , the critical region is 𝑈 ≥ 𝑢𝑐 .
b. If 𝐻1 : 𝜂1 > 𝜂2 , the critical region is 𝑈 ≤ 𝑚𝑛 − 𝑢𝑐 .
c. If 𝐻1 : 𝜂1 ≠ 𝜂2 , the critical region is (𝑈 ≤ 𝑚𝑛 − 𝑢𝑐 ) ∪ (𝑈 ≥ 𝑢𝑐 ).
7. Check whether the test statistic lies in the critical region.
8. State the conclusion in context.

11
Further Mathematics A2 Unit 5: Further Statistics B

Worked Example
In a factory, two methods A and B are used to complete a certain task. The managing
director believes that Method A takes the same time as Method B on average. A trial was
designed to investigate this belief. Method A was used by each of 6 operatives and method
B was used by each of a different set of 7 operatives. The times taken in minutes to
complete the task are shown below.

Method A: 5.3, 2.1, 3.9, 10.2, 1.8, 7.3

Method B: 7.4, 7.2, 4.4, 11.2, 10.8, 3.7, 6.2

Using the Mann-Whitney test, test the manager's belief using a 1% significance level.

Solution:
Let 𝜂1 denote the population median time for method A, and let 𝜂2 denote the population
median time for method B. Let 𝑋 denote the time taken to complete task A and 𝑌 denote the
time taken to complete task B. Assume that the variances of the distributions for times taken
using the two methods are equal. The null and alternative hypotheses are:
𝐻0 : 𝜂1 = 𝜂2
𝐻1 : 𝜂1 ≠ 𝜂2

We now create a table that contains the sample values (in ascending order) and their overall
ranks.

Time for Method A 1.8 2.1 3.9 5.3 7.3 10.2


Rank 1 2 4 6 9 11
Time for Method B 3.7 4.4 6.2 7.2 7.4 10.8 11.2
Rank 3 5 7 8 10 12 13

The observed value of the test statistic 𝑈 is obtained by counting the number of observations
in the second sample that are larger than each observation in the first sample:
𝑢 = 7 + 7 + 6 + 5 + 3 + 2 = 30.

Alternatively, calculate the sum of the ranks for method B, which we denote by 𝑡, as follows:
𝑡 = 3 + 5 + 7 + 8 + 10 + 12 + 13 = 58. Then compute the observed value of the test
statistic using:
𝑛(𝑛 + 1)
𝑢=𝑡− = 58 − 28 = 30.
2

The upper tail critical value 𝑢𝑐 from tables for 𝑚 = 6 and 𝑛 = 7 using a two-tail test at the 1%
significance level is 𝑢𝑐 = 39. The lower tail critical value is 𝑚𝑛 − 𝑢𝑐 = 42 − 39 = 3.
Hence, the critical region is (𝑈 ≤ 3) ∪ (𝑈 ≥ 39).

Since 𝑢 ≰ 3 or 𝑢 ≱ 39, the test statistic does not lie in the critical region. Therefore, there is
insufficient evidence to reject 𝐻0 . This means that the manager's claim is supported at the
1% level of significance, and there is insufficient evidence to conclude that there is a
difference in the average time using either method.

12
Further Mathematics A2 Unit 5: Further Statistics B

Questions

1. The manufacturer of a racing car claims that the time taken for the car to go from 0 mph
to 60 mph is 2.97 seconds. A driver, who doubts the manufacturer’s claim, measures the
time taken to go from 0 mph to 60 mph on 11 separate days. The times were recorded
and are listed below.

3.21 2.90 2.63 3.54 3.43 2.87 3.67 2.98 2.64 3.22 2.91

Carry out a Wilcoxon signed-rank test, with a 10% significance level, commenting on the
conclusion that the driver should reach.

2. The time (in milliseconds) taken by a computer to perform a particular task on 14


randomly chosen occasions were as follows:

6.6 3.6 2.1 13.2 5.4 11.6 1.6 7.2 7.8 3.8 6.0 10.3 3.0 5.2.

From past experience it is thought that the median time for this task is 7.8 milliseconds.
Use the Wilcoxon signed-rank test to test this at the 1% significance level.

3. A puzzle solving workshop claims to help participants improve the time it takes them to
solve puzzles. The time taken for participants to complete a puzzle is recorded before
the workshop. The same participants then complete a puzzle of equivalent difficulty after
the workshop. The times to complete the puzzle are recorded in the table below.

Participant A B C D E F G H I J
Before 31.0 29.9 29.5 30.1 31.8 28.9 26.5 30.4 33.2 31.6
After 30.6 30.4 28.9 29.0 31.5 27.6 26.6 29.7 31.1 30.4

Carry out an appropriate Wilcoxon signed rank test on this data set, using a 2.5%
significance level, to test whether the workshop improves the time participants take to
solve puzzles.

4. A supermarket is deciding which cereal to introduce to their stores. A random sample of


10 customers were asked to rate both cereals and based on their responses a mark out
of 80 was awarded. The results are shown below:

Customer 1 2 3 4 5 6 7 8 9 10
Rating for Cereal 1 53 46 60 53 66 59 52 67 74 47
Rating for Cereal 2 75 67 69 51 76 59 65 68 63 59

Carry out an appropriate Wilcoxon signed rank test on this data set, using a 1%
significance level, to test whether there is any difference in the average marks awarded
to the two cereals by the customers. State clearly your conclusion in context.

13
Further Mathematics A2 Unit 5: Further Statistics B

5. A teacher believes that the order of questions on an exam paper affects a student’s
performance. To investigate this claim, she takes a class of 15 students and randomly
divides them into two groups – one group with 7 students and another with 8 students.
The same set of exam questions are prepared, being sorted into two differing orders. For
the first group, the order of the questions is set from hardest to easiest. For the second
group, the order of the questions goes from easiest to hardest. The marks obtained by
the pupils in both groups are shown below.

Group 1 99 103 108 114 123 135 137


Group 2 109 119 121 128 130 134 136 138

Carry out a Mann-Whitney test at the 5% significance level to investigate whether the
order of questions on an exam paper affects the performance from students.

6. A shopper believes that the grapefruit sold by supermarket A are heavier than those sold
by supermarket B. To investigate this, the shopper weighed ten randomly selected
grapefruit from supermarket A, and nine randomly selected grapefruit from supermarket
B. The weights, in grams, for each grapefruit are shown in the table below.

Weights from Supermarket A 208 246 197 153 118 169 124 144 192 172
Weights from Supermarket B 164 194 132 110 116 104 167 137 122

Using the Mann-Whitney test, test the shopper's belief using a 2.5% significance level.

14
Further Mathematics A2 Unit 5: Further Statistics B

Question 1 Worked Solution


The manufacturer of a racing car claims that the time taken for the car to go from 0 mph to
60 mph is 2.97 seconds. A driver, who doubts the manufacturer’s claim, measures the time
taken to go from 0 mph to 60 mph on 11 separate days. The times were recorded and are
listed below.

3.21 2.90 2.63 3.54 3.43 2.87 3.67 2.98 2.64 3.22 2.91

Carry out a Wilcoxon signed-rank test, with a 10% significance level, commenting on the
conclusion that the driver should reach.

Solution:
Let 𝑋 denote the time taken to go from 0 mph to 60 mph, and let 𝜂 denote the median time.
Assume that the distribution of 𝑋 is symmetrical. The null and alternative hypotheses are:
𝐻0 : 𝜂 = 2.97
𝐻1 : 𝜂 > 2.97

We now create a table that contains the sample values, the differences between the sample
values and the hypothesised median under 𝐻0 , and the ranks of the absolute differences.

Speed, 𝑥 Difference, 𝑑 = 𝑥 − 𝜂0 Rank of |𝑑|


3.21 0.24 5
2.90 -0.07 3
2.63 -0.34 8
3.54 0.57 10
3.43 0.46 9
2.87 -0.10 4
3.67 0.70 11
2.98 0.01 1
2.64 -0.33 7
3.22 0.25 6
2.91 -0.06 2

The observed value of the test statistic is 𝑤 = 5 + 10 + 9 + 11 + 1 + 6 = 42. This test


statistic is the sum of the positive ranks. If 𝐻1 is true, we expect this sum to be large. The
upper tail critical value, 𝑤𝑐 , from tables for 𝑛 = 11 using a one-tail test at the 10%
significance level is 𝑤𝑐 = 48. Hence, the critical region is 𝑊 ≥ 48.

Since 𝑤 < 48, the test statistic does not lie in the critical region. Therefore, there is
insufficient evidence to reject 𝐻0 . This means there is insufficient evidence at the 10%
significance level for the driver to conclude that the time to go from 0 mph to 60 mph
specified by the manufacturer is incorrect.

15
Further Mathematics A2 Unit 5: Further Statistics B

Question 2 Worked Solution


The time (in milliseconds) taken by a computer to perform a particular task on 14 randomly
chosen occasions were as follows:

6.6 3.6 2.1 13.2 5.4 11.6 1.6 7.2 7.8 3.8 6.0 10.3 3.0 5.2.

From past experience it is thought that the median time for this task is 7.8 milliseconds. Use
the Wilcoxon signed-rank test to test this at the 1% significance level.

Solution:
Let 𝜂 denote the population median time for the task and let 𝑋 denote the time taken to
complete the task. We assume that the distribution of 𝑋 is symmetrical. The null and
alternative hypotheses are:
𝐻0 : 𝜂 = 7.8
𝐻1 : 𝜂 ≠ 7.8

We now create a table that contains the sample values, the differences between the sample
values and the hypothesised median 𝜂0 under 𝐻0 , and the ranks of the absolute differences.

Task time, 𝑥 Difference, 𝑑 = 𝑥 − 𝜂0 Rank of |𝑑|


6.6 -1.2 2
3.6 -4.2 9
2.1 -5.7 12
13.2 5.4 11
5.4 -2.4 4
11.6 3.8 7
1.6 -6.2 13
7.2 -0.6 1
7.8 0 -
3.8 -4.0 8
6.0 -1.8 3
10.3 2.5 5
3.0 -4.8 10
5.2 -2.6 6

There is one task time that is exactly 7.8. This observation is not considered, leaving thirteen
observations. The sample size that we are now working with is 13. The observed value of
the test statistic is 𝑤 = 11 + 7 + 5 = 23. This test statistic is the sum of the positive
ranks. If 𝐻1 is true, we expect this sum to either be large or small. The upper tail critical
value, 𝑤𝑐 , from tables for 𝑛 = 13 using a two-tail test at the 1% significance level is 𝑤𝑐 = 81.
1 1
The lower tail critical value is 2 𝑛(𝑛 + 1) − 𝑤𝑐 = 2 (13)(14) − 81 = 91 − 81 = 10. Hence, the
critical region is (𝑊 ≤ 10) ∪ (𝑊 ≥ 81).

Since 𝑤 ≰ 10 or 𝑤 ≱ 81, the test statistic does not lie in the critical region. Therefore, there
is insufficient evidence to reject 𝐻0 and so insufficient evidence at the 1% significance level
to conclude that the median task completion time has changed.

16
Further Mathematics A2 Unit 5: Further Statistics B

Question 3 Worked Solution


A puzzle solving workshop claims to help participants improve the time it takes them to solve
puzzles. The time taken for participants to complete a puzzle is recorded before the
workshop. The same participants then complete a puzzle of equivalent difficulty after the
workshop. The times to complete the puzzle are recorded in the table below.

Participant A B C D E F G H I J
Before 31.0 29.9 29.5 30.1 31.8 28.9 26.5 30.4 33.2 31.6
After 30.6 30.4 28.9 29.0 31.5 27.6 26.6 29.7 31.1 30.4

Carry out an appropriate Wilcoxon signed rank test on this data set, using a 2.5%
significance level, to test whether the workshop improves the time participants take to solve
puzzles.

Solution:
Let 𝑋 (1) denote the time to complete the puzzle before the workshop with population median
𝜂1 . Let 𝑋 (2) denote the time to complete the puzzle after the workshop with population
(1) (2)
median 𝜂2 . Let 𝜂𝐷 = 𝜂1 − 𝜂2 . Assume that the distribution of differences 𝐷𝑖 = 𝑋𝑖 − 𝑋𝑖 is
symmetrical. The null and alternative hypotheses are:
𝐻0 : 𝜂𝐷 = 0
𝐻1 : 𝜂𝐷 > 0

We now create a table that contains the paired sample values, the differences between the
paired sample values and the ranks of the absolute differences.

Participant A B C D E F G H I J
Before 31.0 29.9 29.5 30.1 31.8 28.9 26.5 30.4 33.2 31.6
After 30.6 30.4 28.9 29.0 31.5 27.6 26.6 29.7 31.1 30.4
Difference, 𝐷𝑖 0.4 -0.5 0.6 1.1 0.3 1.3 -0.1 0.7 2.1 1.2
Rank of |𝐷𝑖 | 3 4 5 7 2 9 1 6 10 8

The observed value of the test statistic is:


𝑤 = 3 + 5 + 7 + 2 + 9 + 6 + 10 + 8 = 50.

This test statistic is the sum of the positive ranks. If 𝐻1 is true, we expect this to be large.
The upper tail critical value, 𝑤𝑐 , from tables for 𝑛 = 10 using a one-tail test at the 2.5%
significance level is 𝑤𝑐 = 47. Hence, the critical region is 𝑊 ≥ 47.

Since 𝑤 ≥ 47, the test statistic lies in the critical region. Therefore, there is sufficient
evidence to reject 𝐻0 and so sufficient evidence at the 2.5% significance level to conclude
that the workshop does improve the time participants take to solve puzzles.

17
Further Mathematics A2 Unit 5: Further Statistics B

Question 4 Worked Solution


A supermarket is deciding which cereal to introduce to their stores. A random sample of 10
customers were asked to rate both cereals and based on their responses a mark out of 80
was awarded. The results are shown below:

Customer 1 2 3 4 5 6 7 8 9 10
Rating for Cereal 1 53 46 60 53 66 59 52 67 74 47
Rating for Cereal 2 75 67 69 51 76 59 65 68 63 59

Carry out an appropriate Wilcoxon signed rank test on this data set, using a 1% significance
level, to test whether there is any difference in the average marks awarded to the two
cereals by the customers. State clearly your conclusion in context.

Solution:
Let 𝑋 (1) denote the rating for cereal 1 with population median 𝜂1 . Let 𝑋 (2) denote the rating
for cereal 2 with population median 𝜂2 . Let 𝜂𝐷 = 𝜂1 − 𝜂2 . Assume that the distribution of
(1) (2)
differences 𝐷𝑖 = 𝑋𝑖 − 𝑋𝑖 is symmetrical. The null and alternative hypotheses are:
𝐻0 : 𝜂𝐷 = 0
𝐻1 : 𝜂𝐷 ≠ 0

We now create a table that contains the paired sample values, the differences between the
paired sample values and the ranks of the absolute differences.

Customer 1 2 3 4 5 6 7 8 9 10
Rating for Cereal 1 53 46 60 53 66 59 52 67 74 47
Rating for Cereal 2 75 67 69 51 76 59 65 68 63 59
Difference, 𝐷𝑖 -22 -21 -9 2 -10 0 -13 -1 11 -12
Rank of |𝐷𝑖 | 9 8 3 2 4 - 7 1 5 6

There is one difference that is exactly 0. This observation is not considered, leaving 9
customers. The sample size that we are now working with is 9. The observed value of the
test statistic is 𝑤 = 2 + 5 = 7. This test statistic is the sum of the positive ranks. If 𝐻1 is true,
we expect this to be either small or large. The upper tail critical value 𝑤𝑐 from tables with 𝑛 =
9 using a two-tail test at the 1% significance level is 𝑤𝑐 = 43. The lower tail critical value is
1 1
calculated as 𝑛(𝑛 + 1) − 𝑤𝑐 = (9)(10) − 43 = 45 − 43 = 2. Hence, the critical region is
2 2
(𝑊 ≤ 2) ∪ (𝑊 ≥ 43).

Since 𝑤 ≰ 2 or 𝑤 ≱ 43, the test statistic does not lie in the critical region. Therefore, there is
insufficient evidence to reject 𝐻0 and so insufficient evidence to conclude that there is a
difference in the average marks awarded for the two cereals. The evidence is inconclusive at
the 1% significance level regarding which cereal the supermarket should introduce to their
stores.

18
Further Mathematics A2 Unit 5: Further Statistics B

Question 5 Worked Solution


A teacher believes that the order of questions on an exam paper affects a student’s
performance. To investigate this claim, she takes a class of 15 students and randomly
divides them into two groups – one group with 7 students and another with 8 students. The
same set of exam questions are prepared, being sorted into two differing orders. For the first
group, the order of the questions is set from hardest to easiest. For the second group, the
order of the questions goes from easiest to hardest. The marks, out of 150, obtained by the
pupils in both groups are shown below.

Group 1 99 103 108 114 123 135 137


Group 2 109 119 121 128 130 134 136 138

Carry out a Mann-Whitney test at the 5% significance level to investigate whether the order
of questions on an exam paper affects performance from students.

Solution:
Let 𝜂1 denote the population median mark for the first group and let 𝜂2 denote the population
median yield for the second group. Let 𝑋 denote the mark obtained by a student in group 1
and 𝑌 denote the mark obtained by a student in group 2. We assume that the population
variances are equal for both groups. The null and alternative hypotheses are:
𝐻0 : 𝜂1 = 𝜂2
𝐻1 : 𝜂1 < 𝜂2

We now create a table that contains the sample values and their overall ranks.

Mark for Group 1 99 103 108 114 123 135 137


Rank 1 2 3 5 8 137
12 14
Mark for Group 2 109 119 121 128 130 134 136 138
Rank 4 6 7 9 10 11 13 15

The observed value of the test statistic, 𝑈, is obtained by counting the number of
observations in the second sample that are larger than each observation in the first sample:
𝑢 = 8 + 8 + 8 + 7 + 5 + 2 + 1 = 39.

Alternatively, calculate the sum of the ranks for method 2, which we denote by 𝑡, as follows:
𝑡 = 4 + 6 + 7 + 9 + 10 + 11 + 13 + 15 = 75. Then compute the observed value of the test
statistic using:
𝑛(𝑛 + 1) (8)(9)
𝑢 = 𝑡 − = 75 − = 75 − 36 = 39.
2 2

The upper tail critical value, 𝑢𝑐 , from tables for 𝑚 = 7 and 𝑛 = 8 using a one-tail test at the
5% level of significance is 𝑢𝑐 = 43. Hence, the critical region is 𝑈 ≥ 43.

Since 𝑢 ≱ 43, the test statistic lies in the critical region. Therefore, there is insufficient
evidence to reject 𝐻0 . There is insufficient evidence that the order of the questions on the
exam paper affects students’ performance.

19
Further Mathematics A2 Unit 5: Further Statistics B

Question 6 Worked Solution


A shopper believes that the grapefruit sold by supermarket A are heavier than those sold by
supermarket B. To investigate this, the shopper weighed ten randomly selected grapefruit
from supermarket A, and nine randomly selected grapefruit from supermarket B. The
weights, in grams, for each grapefruit are shown in the table below.

Weights from Supermarket A 208 246 197 153 118 169 124 144 192 172
Weights from Supermarket B 164 194 132 110 116 104 167 137 122

Using the Mann-Whitney test, test the shopper's belief using a 2.5% significance level.

Solution:
Let 𝜂1 denote the population median weight for grapefruit from supermarket A, and let 𝜂2
denote the population median weight for grapefruit B. Let 𝑋 denote the weight of grapefruit
from supermarket A and 𝑌 denote the weight of grapefruit from supermarket B. Assume that
the variances of the distribution of the grapefruit weights from both supermarkets are equal.
The null and alternative hypotheses are:
𝐻0 : 𝜂1 = 𝜂2
𝐻1 : 𝜂1 > 𝜂2

We now create a table that contains the sample values (in ascending order) and their overall
ranks.

Weights from Supermarket A 118 124 144 153 169 172 192 197 208 246
Rank 4 6 9 10 13 14 15 17 18 19
Weights from Supermarket B 104 110 116 122 132 137 164 167 194
Rank 1 2 3 5 7 8 11 12 16

The observed value of the test statistic 𝑈 is obtained by counting the number of observations
in the second sample that are larger than each observation in the first sample:
𝑢 = 6 + 5 + 3 + 3 + 1 + 1 + 1 + 0 + 0 + 0 = 20.

Alternatively, calculate the sum of the ranks for supermarket B, which we denote by 𝑡, as
follows: 𝑡 = 1 + 2 + 3 + 5 + 7 + 8 + 11 + 12 + 16 = 65. The observed value of the test
statistic is:
𝑛(𝑛 + 1) (9)(10)
𝑢 =𝑡− = 65 − = 65 − 45 = 20.
2 2

The upper tail critical value 𝑢𝑐 from tables for 𝑚 = 10 and 𝑛 = 9 using a one-tail test at the
2.5% significance level is 𝑢𝑐 = 69. The lower tail critical value is 𝑚𝑛 − 𝑢𝑐 = 90 − 69 = 21.
Hence, the critical region is 𝑈 ≤ 21.

Since 𝑢 ≤ 21, the test statistic lies in the critical region and therefore there is sufficient
evidence to reject 𝐻0 . The shopper's claim is supported and there is sufficient evidence at
the 2.5% level of significance to conclude that the grapefruit from supermarket A are heavier,
on average, than those from supermarket B.

20

You might also like