Chapter 3 - Data Description (for student)
Chapter 3 - Data Description (for student)
Chapter 3 - Data Description (for student)
ELEMENTARY STATISTICS
CHAPTER 3
Data Description
In Chapter 3, the procedure for finding Percentile (page 81) is
VERY IMPORTANT. The calculation of Quartile (page 85), Five
Number Summary (page 86), Boxplot (page 87), Outlier (page 95)
are related to the calculation Percentile.
Data Description
3.1 Measures of Central Tendency
3.2 Measures of Variation
3.3 Measures of Position
Sample: Population:
A random sample of All working adults in
1000 working adults in Hong Kong
Hong Kong
Estimate
Since the raw data of the sample is not available, we cannot use the
sample mean formula shown earlier. We need another method to
calculate sample mean from grouped frequency distribution.
2. Find the class midpoint of each class and place them in column C.
3. Multiple the frequency by the class midpoint for each class and place the
product in column D.
4. Find the sum of column D.
5. Divide the sum obtained in D by the sum of the frequencies obtained in
column B.
The formula for the sample mean is = ∑& × '
∑& × '
=
(2 × 6) + (3 × 9) + ⋯ + (1 × 21)
=
20
261
=
20
= 13.05
≈ 13.1 *+,#-..-%
Find Mary’s grade point average in the last semester. Correct your
answer to 2 decimal places.
Solution:
∑/ 3 × 4.5 + 4 × 3.0 + 5 × 1.0
= = = 2.5417 ≈ 2.54 2+.%
∑/ 3+4+5
Solution:
Solution:
It is helpful to arrange the data in order, although it is not necessary.
22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23, 23, 24, 24,
24, 24, 24, 25, 25, 25, 25, 25, 25, 26, 26, 26, 27, 27, 27
Since the temperature of 23 degree occurs 7 times – a frequency
larger than any other number – the mode for the data set is 23 0C.
Solution:
Solution:
5.>6 ?.
Solution: 37 = = 230C
Mean
Mode < Median < Mean
Median
Mode Positively skewed or right-skewed
Symmetric
250
Frequency
Frequency
Frequency
Frequency
150
150
0 50
0 50
0 20 40 60 80 100 0 20 40 60 80 100
Score
boys.score Score
girls.score
Solution:
The range = 7 = 305 − 206 = 99 2;A%
∑ −
C= C =
Step 2: Subtract the population mean from each value and place
the result in column B of the table shown below.
∑ −
% =
−1
∑ GLG M
%=
L
2. Multiply the frequency by the class midpoint for each class and place the
products in column D.
3. Multiply the frequency by the square of the class midpoint for each class
and place the products in column E.
4. Find the sum of column B, D and E.
5. Compute the sample variance % using the formula given below.
∑(& · ' ) − [ ∑& · ' /]
% =
−1
6. Take the square root to get the sample standard deviation %.
Step 2: Multiply the frequency by the class midpoint for each class
and place the products in column D.
Step 3: Multiply the frequency by the square of the class midpoint
for each class and place the products in column E.
Solution
(a) Sample mean =
(b) Modal class =
(c) Sample variance =
(d) Sample standard deviation =
Hint:
Calculate and compare the coefficients of variation of the above
two variables.
Solution:
S H
For age, PQ# = × 100% = × 100% = 15.625% ≈ 15.63%
G
S HTT
For salary, P9# = × 100% = × 100% = 3.3333% ≈ 3.33%
G HTTT
Since the coefficient of variation is larger for ages, the ages of the workers are
more variable than their salaries.
Solution:
Rounding rule: Standard scores have no unit. You can round your
answer to 2 decimal places.
Solution:
GLV 5L5H
For Susan, U %*- = = = −1.3333 ≈ −1.33
W
GLV IL5H
For Mary, U %*- = = = 2.00
W
Interpretation:
Susan’s retirement age is 1.33 standard deviations below mean. (negative z score)
Mary’s retirement age is 2.00 standard deviations above mean. (positive z score)
Score Frequency
196.5 – 217.5 5
217.5 – 238.5 17
238.5 – 259.5 22
259.5 – 280.5 48
280.5 – 301.5 22
301.5 – 322.5 6
196.5 – 217.5 5 5
217.5 – 238.5 17 5+17 = 22
238.5 – 259.5 22 5+17+22 = 44
259.5 – 280.5 48 5+17+...+48 = 92
280.5 – 301.5 22 5+17+...+22 = 114
301.5 – 322.5 6 5+17+...+6 = 120
90
80
Cumulative Percentage
70
60
50
40
30
20
10
x
0
175.5 196.5 217.5 238.5 259.5 280.5 301.5 322.5
Score
100 y
90
80
77
70
Cumulative Percentage
60
50
40
30
a) 30th percentile is
20 approximately 251.
10 x b) 77th is corresponding
0
175.5 196.5 217.5 238.5 251259.5 280.5 301.5 322.5
to the score 280.5.
Score
80
70
Cumulative Percentage
60
50
40
30
20
10 x
0
45.5 54.5 63.5 72.5 81.5 90.5 99.5 108.5
Hourly Wages in $
Step 1 – Sort the data in ascending order 12, 28, 35, 42, 47, 49, 50
·} I×5T
Step 2 – Compute * = = = 4.2
TT TT
Step 3a – Since | is not a whole number, we need to round it up to the next
integer (i.e. * = 5).
The 60th percentile is equal to 47 marks (i.e. the 5th value in the sorted data set)
Step 1 – Sort the data in ascending order 2, 3, 5, 6, 8, 10, 12, 15, 18, 20
·} T×5T
Step 2 – Compute * = = =6
TT TT
Step 3b – Since | is a whole number, 60th percentile will be the average of the
6th and the 7th data values in the sorted data
(T6 )
Therefore 60th percentile = = 11 marks.
~ = 47"% 30 296
34 = 83.5"%
~ = 164"%
<+,ℎ = 296"% Reaction time in ms
7 18 17 29 18 4 27 30 2 4 10 21 5 8
× H
For ~ (i.e. 25th percentile), * = = 3.5
TT
Therefore ~ = 4th data value = 5g
×HT
For ~ (i.e. 50th percentile), * = =7
TT
Therefore ~ = average of 7th and 8th data values = (10+17)/2 = 13.5g
×IH
For ~ (i.e. 75th percentile), * = = 10.5
TT
Therefore ~ = 11th data value = 21g
Solution:
Step 1 – Sort the data in ascending order 3, 16, 17, 18, 19, 20, 21, 22. ~ =
16.5 and ~ = 20.5
Step 2 – ~7 = ~ − ~ = 20.5 − 16.5 = 4
Step 3 – 8 = ~ − (1.5 × ~7) = 16.5 − (1.5 × 4) = 10.5
Step 4 – = ~ + (1.5 × ~7) = 20.5 + (1.5 × 4) = 26.5
Step 5 – The data value 3 is less than the lower bound 8, therefore 3 can be
considered as a potential outlier.
Solution:
Step 1 – Sort the data in ascending order 5, 13, 14, 18, 19, 25, 26, 27.
~ = and ~ =
Step 2 – ~7 = ~ − ~ =
Step 3 – 8 = ~ − (1.5 × ~7) =
Step 4 – = ~ + (1.5 × ~7) =
Step 5 –