Chapter Four: Measures of Variation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

Chapter Four

Measures of Variation

1
Chapter Goals
After completing this chapter, you will be able to:
• Compute and interpret the absolute and relative measures of
variation for a set of data.

• Find the range, variance, standard deviation, coefficient of


variation, and standard score and know what these values mean.

• Apply the empirical rule to describe the variation of population


values around the mean.

2
Introduction
 Dispersion refers to lack of uniformity in the sizes or
qualities.
 A measure of central tendency only shows the middle or
the average of a dataset, i.e., variability cannot be
determined.
 Example: Consider the following datasets
A: 1, 2, 5, 6, 6 B: −40, 0, 5, 20, 35
Mean(A) = Mean(B) = 4
 However, the data in A seem more consistent (less
variable) than the data in B.
 If observations are close to the center, we say that
dispersion is small.
3
4.1 Objectives of Measuring Variation

To study the extent to which observations are scattered


around the central value.
To assess the reliability of the average being used: If
the dispersion in the values of various items in a
dataset is large, the average may be unrepresentative of
the dataset.
To compare two or more sets of data with regard to
their variability.
To identify the causes of variability with a view to
control it.
To serve as a basis for further statistical analysis.
4
4.2 Absolute and Relative Measures
Measure of Dispersion

Absolute Relative
• Absolute Variations are expressed in the same units of
measurement in which the original data are given.
• Recommended to compare variations in distributions
where units/standards of measurements are the
same.
• A relative variation is obtained from the ratio of
absolute variation to a measure of central tendency.
• These are used to compare variations of sets of data
measured without same standards(units).
5
Absolute and Relative Measures . . .

Absolute Variation Relative Variation

Range Coefficient of range

Inter– quartile range Coefficient of quartile deviation

Quartile deviation Coefficient of mean deviation

Mean deviation Coefficient of variation

Variance and sd Standard scores

6
4.3 Types of Measures of Variation
4.3.1 Range and Relative Range
Range: the difference between the smallest and the largest
values.
 Range= UCB of the last class – LCB of the first class
(for grouped frequency distribution)
• Example: Find the range of the following distributions.
1) 23, 42, 20, 30, 35, 21, 45, 33, 23, 23, 20, 42, 29, 20.
Range: 45 – 20 = 25
1) Class: 2.5 – 10.5 10.5 – 18.5 18.5 – 26.5 26.5 – 34.5
Frequency: 4 7 6 15
Range: 34.5 – 2.5 = 32
7
Range and Relative Range . . .
 Range cannot be calculated for open–end distributions.

 Relative Range: the ratio of the range to the sum of the


maximum and the minimum values in a dataset.
RelativeRange  LS
LS
• Example: Find the relative range for:
23, 42, 20, 30, 35, 21, 45, 33, 23, 23, 20, 42, 29, 20.

RelativeRange  45 20  0.385


45 20

8
Disadvantages of the Range

• Ignores the way in which data are distributed

7 8 9 10 11 12 7 8 9 10 11 12
Range = 12 - 7 = 5 Range = 12 - 7 = 5

• Sensitive to outliers
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

Range = 5 - 1 = 4

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 120 - 1 = 119

• Cannot be calculated for open–end distributions.


9
4.3.2 Variance, Standard Deviation and Coefficient of Variation

• Sum of squares is obtained by subtracting the mean from an


observation and squaring this "deviate".
• Variance is average of the sum of squares.
 If X1, X2,..., XN are N observations, then the population
variance, denoted σ2 is given by:
N

 (x i  μ) 2

σ 2 i 1
N
Where
μ = population mean
N = population size
xi = ith value of the variable x 10
Sample Variance
• Average (approximately) of squared deviations of values
from the mean.
– Sample variance: n

 (x  x)i
2

s 
2 i1
n -1
X = arithmetic mean
Where
n = sample size
Xi = ith value of the variable X
• The sum of squares in this case is divided by n-1 in order
to get an unbiased estimator of the population variance.
11
Standard Deviation
• Standard deviation is the positive square root of variance.
• To put the variance on the same scale as the original data, we
prefer to work with the standard deviation.
• Standard deviation shows variation about the mean.
 Population standard deviation:
N

 i
(x  μ) 2

σ i 1
N
 Sample standard deviation:
n

 i
(x  x) 2

S i 1
n -1
12
Example: Sample Standard Deviation

Sample
Data (xi) : 10 12 14 15 17 18 18 24

n=8 Mean = x = 16

s (10  X)2  (12 x)2  (14 x)2  (24 x)2


n 1

 (10 16)2  (1216)2  (1416)2  (2416)2


8 1
A measure of the “average” scatter
 126  4.2426 around the mean
7
13
Alternative Formulae
1. 2. 1 ( x) 2
s
1
[ x i2  nx 2 ] s [ x i 
2
]
n 1 n 1 n

 For data from a frequency distribution, standard deviation is


given by:
s  1  f(x  x)2 Here x  fx
n 1 f

s 1 [n fx 2  ( fx)2].
n(n 1)  
where, X is the class mark of the ith class
Examples: 1. Find the sample mean, variance & sd for A and B.
A: 10 60 50 30 40 20
B: 40 30 45 35 40 20
14
Example . . .
A B A- XA (A - X )2 B - X
A B (B - XB )2
10 40 -25 625 5 25
60 30 25 625 -5 25
50 45 15 225 10 100
30 35 -5 25 0 0
40 40 5 25 5 25
20 20 -15 225 -15 225 Variances
Total 210 210 0 1750 0 400 differ
Mean 35 35 0 0

 Var (A) = 350 Var(B) = 80

 Sd(A) = 18.71 Sd(B) = 8.9


• 2) Class: 0 –10 10 –20 20 –30 30 – 40 40 – 50
• Frequency 7 6 15 12 10
15
Example . . .

Class f Xm fXm (Xm - X ) (Xm - X )2 f(Xm - X )2


0 - 10 7 5 35 -22.4 501.76 3512.32
10 -20 6 15 90 -12.4 153.76 922.56
20 –30 15 25 375 -2.4 5.76 86.40
30 - 40 12 35 420 7.6 57.76 693.12
40- 50 10 45 450 17.6 309.76 3097.60
Total *** 50 1370 8312
Mean 27.4

 Var(X) = 8312/49 = 169.63


 Since the variance is just the square of the standard deviation,
these quantities contain essentially the same information on
different scales.

16
Measuring Variation

Small standard deviation

Large standard deviation

17
Comparing Standard Deviations
Which of the following datasets is the most variable?
Data A
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21
s = 3.338

Data B
Mean = 15.5
11 12 13 14 15 16 17 18 19 20 21 s = 0.926

Data C
Mean = 15.5

11 12 13 14 15 16 17 18 19 20 21 s = 4.570

18
Algebraic Properties of Variance and Standard Deviation
• Sd(X) > = 0
• If K is added to/subtracted from each observation, the
variance and sd remain the same.
• If each observation is multiplied by K, the new variance &
sd will be (K2)(“previous” variance) & (|K|)(“previous” sd)
respectively.
• If each observation is divided by k, the new variance & sd
will be (“previous” variance)/K2 & (“previous” sd) / |K|
respectively.
 If the data are from a sample and approximately normally
distributed, then
 x1s will include approximately 68% of the data.
 x 2 s will include approximately 95% of the data.
 x 3s will include approximately 99.7% of the data. 19
The Empirical Rule

68%

μ
x  1s

95% 99.7%

x  2s x  3s
20
The Empirical Rule . . .

• Empirical Rule tells us that the range from the


minimum to the maximum data values equals
about 4 to 6 standard deviations for data sets with
an approximate bell shape.
• For a large data set you can get a rough idea of the
value of the standard deviation by dividing the
range by 6 .
s  Range
6

21
Advantages and Disadvantages of Standard Deviation

• Each value in the data set is used in the calculation

• It is rigidly defined and its value is always definite.

• Values far from the mean are given extra weight


(because deviations from the mean are squared)

• Least affected by sampling fluctuations.

• It is comparatively difficult to calculate.


22
Coefficient of Variation (CV)
• Measures relative variation
• Always in percentage (%)
• Useful to compare the amount of variation among
groups with different means.
• CV is a unitless quantity.
• Can be used to compare two or more sets of data
measured in different units

CV  100%
s 

x 
23
Comparing Coefficient of Variation
• Example: Which stock is more variable relative to price?
• Stock A:
– Average price last year = Birr50
– Standard deviation = Birr 5


s
CV  100%  birr 5 100%  10%


A x  birr 50 Both stocks
have the same
• Stock B: standard
deviation, but
– Average price last year = Birr100 stock B is less
– Standard deviation = Birr 5 variable relative
to its price


CV  100%  birr 5 100%  5%
s 

B x  birr 100
24
4.4 Standard Score (Z – score)
• To show how far above or below an individual value is
compared to the population mean in units of standard
deviation
– “How far above or below”= (data value – mean)
– “In units of standard deviation” = divided by s
• Standardized data value

Z  Value of the variable mean


Standarddeviation
– A negative z means the data value falls below the mean.
– A zero z means the data value is the same as the mean.
– A positive z means the data value is above the mean.

25
Example
• A student obtained 80 on a civics exam that had a mean of 70
and a standard deviation of 10. The same student obtained 60
on a calculus exam, which had a mean of 51 and a variance of
64. On which exams did the student perform better relative to
other students? Why?
Civics Calculus
Mean = 70 Mean = 51
Standard deviation = 10 Standard deviation = 8
Score = 80 Score = 60
Z = [(80 – 70)/10] Z = [(60 – 51)/8]
Z= 1.00 Z= 1.125

• Since ZCal is greater than ZCiv, the student performed better on


calculus exam
26

You might also like