Measure of Variation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 16

MEASURE OF VARIATION

Range, IQR, Quartile, decile and percentile, Quartile Deviation, Standard


deviation.
Measure of Variation or Dispersion
Dispersion measures the extent to which the items vary from a central value.

The various measures of central tendency gives one single value that represents the
entire data, but that value alone can’t adequately describe a set of observations,
unless all the observations are the same. It is necessary to describe the variability
or dispersion of the observations.
Dispersion
The word dispersion has a technical meaning in statistics. The average measures
the center of the data, and it is one aspect of observation. Another feature of the
observation is how the observations are spread about the center. The observations
may be close to the center or they may be spread away from the center. If the
observations are close to the center (usually the arithmetic mean or median), we
say that dispersion, scatter or variation is small. If the observations are spread away
from the center, we say dispersion is large.

Case:

Suppose we have three groups of students who have obtained the following marks
on a test. The arithmetic means of the three groups are also given below:

Group A: 46, 48, 50, 52, 54 ̅̅̅ =50


Group B: 30, 40, 50, 60, 70 ̅̅̅̅ =50
Group C: 40, 50, 60, 70, 80 ̅̅̅̅ =60

In groups A and B the arithmetic means are equal, i.e. ̅̅̅ =̅̅̅̅ =50. But in group A
the observations are concentrated around the center. All students in group A have
almost the same level of performance. We say that there is consistency in the
observations in group A. In group B the mean is 50 but the observations are not
close to the center. One observation is as small as 30 and one observation is as
large as 70. Thus there is greater dispersion in group B. In group C the mean is 60
but the spread of the observations with respect to the center 60 is the same as the
spread of the observations in group B with respect to their own center, which is 50.
Thus in groups B and C the means are different but their dispersion is the same. In
groups A and C the means are different and their dispersions are also different.
Dispersion is an important feature of observation and it is measured with the help
of the measures of dispersion, scatter or variation. The word variability is also used
for the idea of dispersion.

The study of dispersion is very important in statistical data. If in a certain factory


there is consistency in the wages of workers, the workers will be satisfied. But if
some workers have high wages and some have low wages, there will be unrest
among the low paid workers and they might go on strike and arrange
demonstrations. If in a certain country some people are very poor and some are
very rich, we say there is economic disparity. This means that dispersion is large.

The idea of dispersion is important in the study of workers’ wages, price of


commodities, standards of living of different people, distribution of wealth,
distribution of land among framers, and many other fields of life. Some brief
definitions of dispersion are:

1. The degree to which numerical data tend to spread about an average value is
called the dispersion or variation of the data.
2. Dispersion or variation may be defined as a statistic signifying the extent of the
scatteredness of items around a measure of central tendency.
3. Dispersion or variation is the measurement of the size of the scatter of items in
a series about the average.

Need of measure of Variation


1. To gauge the reliability of an average.
2. To serve as a basis for the control of variability.
3. To compare two or more distribution with regard to their variability.
4. To facilitate the use of other statistical measures.

Measures of Dispersion
For the study of dispersion, we need some measures which show whether the
dispersion is small or large. There are two types of measure of dispersion, which
are:

(a) Absolute Measures of Dispersion


(b) Relative Measures of Dispersion
Absolute Measures of Dispersion
These measures give us an idea about the amount of dispersion in a set of
observations. They give the answers in the same units as the units of the original
observations. When the observations are in kilograms, the absolute measure is also
in kilograms. If we have two sets of observations, we cannot always use the
absolute measures to compare their dispersions. We shall explain later as to when
the absolute measures can be used for comparison of dispersion in two or more sets
of data. The absolute measures which are commonly used are:

1. The Range
2. The Quartile Deviation
3. The Mean Deviation
4. The Standard Deviation and Variance

Relative Measures of Dispersion


These measures are calculated for the comparison of dispersion in two or more sets
of observations. These measures are free of the units in which the original data is
measured. If the original data is in dollars or kilometers, we do not use these units
with relative measures of dispersion. These measures are a sort of ratio and are
called coefficients. Each absolute measure of dispersion can be converted into its
relative measure. Thus the relative measures of dispersion are:

1. Coefficient of Range or Coefficient of Dispersion


2. Coefficient of Quartile Deviation or Quartile Coefficient of Dispersion
3. Coefficient of Mean Deviation or Mean Deviation of Dispersion
4. Coefficient of Standard Deviation or Standard Coefficient of Dispersion
5. Coefficient of Variation (a special case of Standard Coefficient of Dispersion)

Range
Range, as the word suggests, represents the difference between the largest and the
smallest value of data. This helps us determine the range over which the data is
spread.
L = Largest Value
S = Smallest Value
Range = L – S

Example
There are ten students in the class, and they recently gave a test out of 100 marks.
There are two scenarios here.
First: 50, 53, 50, 51, 48, 93, 90, 92, 91, 90
Second: 71, 72, 70, 75, 73, 74, 75, 70, 74, 72
The range in the first scenario is represented by the difference between the largest
value, 93 and the smallest value, 48. The range therefore is,
Range in First set = 93 – 48 = 45
Whereas in the second scenario, the range is represented by the difference between
the highest value, 75 and the smallest value, 70.
Range in the second set = 75 – 70 = 5
The difference in the value of range between the two scenarios enables us to
estimate the range over which the values are spread. The larger the range, the
larger apart the values are spread.

Coefficient of Range
In order to compare the variability of two or more distributions, which are given in
two different units of measurement, we need a relative a relative measure which is
independent of the units of measurement, which is known as coefficient of range.
coefficient of range =

Example
Let us take two sets of observations. Set A contains the marks of five students in
mathematics out of 25 marks and group B contains marks of the same students in
English out of 100 marks.

Set A: 10, 15, 18, 20, 20


Set B: 30, 35, 40, 45, 50
The values of the ranges and coefficients of range are calculated as:

Range Coefficient of Range


Set A: (Mathematics) 20–10=10 20–10/20+10=0.33
Set B: (English) 50–30=20 50–30/50+30=0.25
In set A the range is 10 and in set B the range is 20. Apparently it seems there is
greater dispersion in set B, but this is not true. The range of 20 in set B is for more
observations and the range of 10 in set A is for fewer observations. Thus 20 and 10
cannot be compared directly. Their base is not the same. The marks in mathematics
are out of 25 and the marks of English are out of 100. Thus, it makes no sense to
compare 10 with 20. When we convert these two values into coefficients of range,
we see that the coefficient of range for set A is greater than that of set B. Thus
there is greater dispersion or variation in set A. The marks of students in English
are more stable than their marks in mathematics.

Ungrouped Data
Example:
The following are the wages of 8 workers in a factory. Find the range and
coefficient of range. Wages are in dollars: 1400, 1450, 1520, 1380, 1485, 1495,
1575, 1440.

Here the largest value =L=1575and the smallest value S=1380


Range =L-S=1575–1380=195
Coefficient of Range =L-S/L+S=1575–1380/1575+1380=195/2955=0.066

Grouped Data
i) Discrete Frequency Distribution
Example:
The following distribution gives the number of houses and the number of persons
per house.

Number of
Persons 1 2 3 4 5 6 7 8 9 10

Number of
Houses 26 113 120 95 60 42 21 14 55 44

Calculate the range and coefficient of range.

Solution:
Here the largest value =L= 10and the smallest value =S=1
Range =L-S=10–1=9
Coefficient of Range =L-S/L+S=10–1/10+1=9/11=0.818

ii) Continous Frequency Distribution


Find the range of the weight of the students of a university.

Weight (Kg) 60–62 63–65 66–68 69–71 72–74


Number of
5 18 42 27 8
Students
Calculate the range and coefficient of range.

Solution:

Weight (Kg) Class Boundaries Mid Value No. of Students


60–62 59.5–62.5 61 5
63–65 62.5–65.5 64 18
66–68 65.5–68.5 67 42
69–71 68.5–71.5 70 27
72–74 71.5–74.5 73 8

Method 1:
Here L= the upper class boundary of the highest class =74.5
and S= the lower class boundary of the lowest class =59.5
Range =L-S=74.5–59.5=15
Coefficient of Range =L-S/L+S =74.5–59.5/74.5+59.5=15/134=0.1119

Method 2:
Here L= the mid value of the highest class =73
and S=the mid value of the lowest class =61
Range =L-S=73–61=12
Coefficient of Range =L-S/L+S =73–61/73+61=12/134=0.0895
Note:
It does not enjoy any prominent place in statistical theory, but it has its application
and utility in quality control methods which are used to maintain the quality of
products produced in factories. The quality of products is to be kept within a
certain range of values.
Range is based on two extreme observations. It gives no weight to the central
values of the data. It is a poor measure of dispersion and does not give a good
picture of the overall spread of the observations with respect to the center of the
observations. Let us consider three groups of data which have the same range:

Group A: 30, 40, 40, 40, 40, 40, 50


Group B: 30, 30, 30, 40, 50, 50, 50
Group C: 30, 35, 40, 40, 40, 45, 50
In all the three groups the range is 50 – 30 = 20. In group A there is a concentration
of observations in the center. In group B the observations are concentrated in the
extreme corners, and in group C the observations are almost equally distributed in
the interval from 30 to 50. The range fails to explain differences in the three groups
of data.

INTER QUARTILE RANGE (IQR)

Represents the difference between upper quartile (Q3) and lower quartile (Q1).

IQR = Q3 - Q1

Example

25, 55, 5, 45, 15, 35 are marks obtained by students. Find IQR.

5, 15, 25, 35, 45, 55

N= 6

th th
Q1 = Size of ( ) item = Size of ( ) item
= Size of ( ) th item = Size of th
item
st nd st
= Size of item + 0.75[Size of item- Size of item]
= 5 + 0.75[15 – 5]
= 12.5
Q3 = = Size of ( ) th item = Size of ( ) th item
= Size of ( ) th item = Size of th
item
th th th
= Size of item + 0.25[Size of item- Size of item]
= 45 + 0.25[55 –45]
=47.5

IQR = Q3 - Q1 = 47.5 – 12.5 = 35

Example
Let's calculate IQR for the following discrete data:

Items 14 36 45 70 105 145

Frequency 2 5 1 6 4 2

Q1= 36
Q3 = 105
IQR = Q3 – Q1 = 105 – 36 = 69

Example
Find IQR
Weekly Income Number of Families
2000-4000 20
4000-6000 40
6000-8000 50
8000-10000 32
10000-12000 16
12000-14000 2

Q1= 5000
Q3 = 8625
IQR = Q3 – Q1 = 8625 – 5000 = 3625

Quartile Deviation
Quartile deviation is based on the lower quartile Q1 and the upper quartile Q3. The
difference Q3 –Q1 is called the inter quartile range. The difference Q3 –Q1 divided
by 2 is called semi-inter-quartile range or the quartile deviation. Thus
Q.D= Q3 – Q1 / 2

The quartile deviation is a slightly better measure of absolute dispersion than the
range. It indicates the average amount by which the quartile differs from the
median. This is absolute measure of quartile deviation.
A small quartile deviation implies small variation among the central 50% items of
the distribution.
A high quartile deviation implies high variation among the central 50% items of
the distribution.

Coefficient of Quartile Deviation


A relative measure of dispersion based on the quartile deviation is called the
coefficient of quartile deviation. It is defined as:

Coefficient of Quartile Deviation=


It is a pure number free of any units of measurement. It can be used for comparing
the dispersion of two or more sets of data.

Example
The wheat production (in Kg) of 20 acres is given as: 1120, 1240, 1320, 1040,
1080, 1200, 1440, 1360, 1680, 1730, 1785, 1342, 1960, 1880, 1755, 1720, 1600,
1470, 1750, and 1885. Find the quartile deviation and coefficient of quartile
deviation.
Solution:
After arranging the observations in ascending order, we get

1040, 1080, 1120, 1200, 1240, 1320, 1342, 1360, 1440, 1470, 1600, 1680, 1720,
1730, 1750, 1755, 1785, 1880, 1885, 1960.

Q1 =1260

Q3 = 1753. 75

Q.D= Q3 – Q1 / 2
= 1753.75 – 1260/2

= 246.875

Coefficient of Quartile Deviation=Q3–Q1/Q3+Q1 = 0.164


Example
Find QD and Coefficient of Quartile Deviation.
Weekly Income Number of Families
2000-4000 20
4000-6000 40
6000-8000 50
8000-10000 32
10000-12000 16
12000-14000 2

Q1= 5000
Q3 = 8625
IQR = Q3 – Q1 = 8625 – 5000 = 3625
QD = Q3 – Q1/2 = 3625/ 2 = 1812.5
Coefficient of Quartile Deviation = Q3–Q1/Q3+Q1 = 3625/ 13625 = 0.2661

Question
The marks obtained by 9 students in a test are
25, 20, 15, 45, 18, 7, 10, 38, 12
Find the value of Q1, Q3, IQR, QD.

Solution
Arrange in increasing order
7, 10, 12, 15, 18, 20, 25, 38, 45
N=9
Q1 = Size of ( ) th item = Size of ( ) th
item
= Size of ( ) th item = Size of th
item
nd rd nd
= Size of item + 0.5[Size of item- Size of item]
= 10 + 0.5[12 – 10] = 10+0.5*2 = 10 + 1 = 11
Q3 = Size of ( ) th item = Size of ( ) th item
= Size of ( ) th item = Size of th
item
th th th
= Size of item + 0.5[Size of item- Size of item]
= 25 + 0.5[38 – 25] = 25+0.5*13 = 25+6.5 = 31.5

IQR = Q3 – Q1 = 31.5 – 11= 20.5


QD = Q3 – Q1/2 = 20.5/2 = 10.25
Question

Find the quartile deviation

F
10-20 12
20-30 19
30-40 5
40-50 10
50-60 9
60-70 6
70-80 6

Solution

F Cf
10-20 12 12
20-30 19 31
30-40 5 36
40-50 10 46
50-60 9 55
60-70 6 61
70-80 6 67

N=67
th
Q1 = Size of ( ) item
th th
= Size of ( ) item = Size of item

Which corresponds to cf =31 and therefore the class 20-30


Q1 = +( )*h= +( ) * 10 = 22.5

th
Q3 = Size of ( ) item
th
= Size of ( ) item
th
=Size of item

Which corresponds to cf =55 and therefore the class 50-60

Q3 = +( )*h= +( ) * 10 = 54.72

IQR = Q3 – Q1 = 54.72- 22.5 = 32.22


QD = Q3 – Q1/2 = 32.22/2 = 16.11

Decile Range

Arrange the data is ascending order


Find D1 and D9 for the data
Decile Range = D9 - D1

Decile Deviation

Arrange the data is ascending order


Find D1 and D9 for the data
Decile Range = D9 - D1
Decile Deviation = D9 - D1/ 2

Coefficient of Decile Deviation

Arrange the data is ascending order


Find D1 and D9 for the data
Coefficient of Decile Deviation =

Example
Calculate Decile Range, Decile Deviation and Coefficient of Decile Deviation
from the following data
85, 96, 76, 108, 85, 80, 100, 85, 70, 95

Solution
Arranging the data in ascending order
70, 76, 80, 85, 85, 85, 95, 96, 100, 108

N=10
th th
= Size of {9( item= Size of {9( item
th
= Size of {9( item = Size of {9.9 th item
= Size of {9 th item+0.9[Size of {10 th item - Size of {9 th
item]
= 100+0.9[108 – 100] = 100+0.9*8= 100 + 7.2= 107.2
th th
= Size of ( item= Size of ( item
= Size of ( th item = Size of {1.1 th item
= Size of {1 st item +0.1[Size of {2 nd item - Size of {1 st
item]
= 70 + 0.1[76-70] = 70 +0.1*6 = 70 + 0.6 = 70.6

Decile Range = D9 - D1 = 107.2 – 70.6 = 36.6

Decile Deviation = D9 - D1/ 2 = 36.6/2 = 18.3

Coefficient of Decile Deviation = = = 0.206

Percentile Range

Arrange the data is ascending order


Find P10 and P90 for the data
Percentile Range = P90 - P10

Percentile Deviation

Arrange the data is ascending order


Find D1 and D9 for the data
Percentile Range = P90 - P10
Percentile Deviation = P90 - P10 / 2

Coefficient of Percentile Deviation

Arrange the data is ascending order


Find D1 and D9 for the data
Coefficient of Percentile Deviation =
Example
Calculate Percentile Range, Percentile Deviation and Coefficient of Percentile
Deviation from the following data
85, 96, 76, 108, 85, 80, 100, 85, 70, 95

Solution
Arranging the data in ascending order
70, 76, 80, 85, 85, 85, 95, 96, 100, 108

N=10

th th
= Size of {90( item= Size of {90( item
th
= Size of {90( item = Size of {9.9 th item
= Size of {9 th item+0.9[Size of {10 th item - Size of {9 th item]
= 100+0.9[108 – 100] = 100+0.9*8= 100 + 7.2= 107.2
th th
= Size of {10( item= Size of {10( item
= Size of ( th item = Size of {1.1 th item
= Size of {1 st item +0.1[Size of {2 nd item - Size of {1 st
item]
= 70 + 0.1[76-70] = 70 +0.1*6 = 70 + 0.6 = 70.6

Percentile Range = P90 - P10= 107.2 – 70.6 = 36.6

Percentile Deviation = P90 - P10/ 2 = 36.6/2 = 18.3

Coefficient of Percentile Deviation = = = 0.206

MEAN DEVIATION

It is the Arithmetic Mean of the absolute deviation of all items of the distribution
from a measure of central tendency.
(i) Ungrouped Data
Thus for an ungrouped data of N observations in which the suitable average is ̅ ,
the mean deviation M.D is given by the relation:

∑ ̅
M.D. =
∑ ̅ ̅ ̅ ̅ ̅
=

mean deviation about median


M.D. =

mean deviation about mode


M.D. =

Example:

Calculate the mean deviation from the (1) arithmetic mean (2) median (3) mode in
respect to the marks obtained by nine students given below and show that the mean
deviation from the median is the minimum.

Marks (out of 25): 7, 4, 10, 9, 15, 12, 7, 9, 7

Solution:

After arranging the observations in ascending order, we get


Marks: 4, 7, 7, 7, 9, 9, 10, 12, 15

Mean=∑X/n=80/9=8.89

th th th
Median = Size of ( ) item= Size of ( ) item= Size of ( ) item
th
= Size of item= 9

Mode = 7(Since 7 is repeated maximum number of times)


Marks (X) ̅
4 4.89 5 3
7 1.89 2 0
7 1.89 2 0
7 1.89 2 0
9 0.11 0 2
9 0.11 0 2
10 1.11 1 3
12 3.11 3 5
15 6.11 6 8
Total 21.11 21 23

∑ ̅
M.D from mean M.D. = = 21.11/9=2.35


M.D from median M.D. = =21/9=2.33

M.D from mode M.D. = =23/9=2.56

From the above calculations, it is clear that the mean deviation from the median
has the least value.

You might also like