Dispersion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 31

Chapter- Five

Measures of Dispersion

Introduction:
We have studied methods of finding the average or a measure of
central tendency of a frequency distribution and of locating certain
values, such as median, decile and percentiles. However, it is not
enough to compare averages. The extent and nature of the spread of
the values around the measure of central tendency is to be
ascertained. Thus it may be said that an average becomes meaningful
in the real sense only when it is accompanied by a measure of
dispersion. The study of dispersion is useful when-
1. it is necessary to know the structure of the frequency
distributions whose averages are similar.
2. it is necessary to know the structure of the frequency
distributions, whose formations are alike but which have
different means.
3. we want to have a comprehensive idea of the formation of a
series, it is necessary to know its dispersion.

Dispersion meanly used for difference or deviation from observation


to central value. It is important to known how the observations of the
variate around or dispersed away from the central value of the
distribution. Dispersion is meanly define, how far the observations
differ from the central point to the other observations.

Definition: The measurement of the scatter of the values of a data


set among themselves is called a measure of dispersion or variation

Objectives of Measuring Dispersion


The study of dispersion is important for the following reasons:
1. For determining the reliability of an average.
2. For controlling the variability.
3. For comparing two or more series with regard to their
variability.

1
4. For facilitating the use of other statistical measures.

Characteristic of an Ideal Measure of Dispersion:


Following are the characteristic of an Ideal Measure of Dispersion:
1. It should be rigidly defined.
2. It should be easy to calculate and easy to understand.
3. It should be based on all the observations.
4. It should be amenable to further algebraic treatments.
5. It should be less affected by sampling fluctuation.
6. It should have sampling stability.
Application of Measures or Dispersion:
1. Compare the characteristics between two or more
observations or information.
2. To determine the reliability of an average;
3. To serve as a basic for the control of the variation;
4. To compare two or more series with regard to their variation;
5. To facilitate the use of other statistical measures.

Measures of Dispersion
Following are the measures of dispersion:
a) Absolute measures
(i) Range. (ii) Quartile deviation
(iii) Standard deviation (iv) Mean deviation
b) Relative measures
(i) Co- efficient of Range (ii) Co- efficient of
quartile deviation
(ii) Co- efficient of Variation (iv) Co- efficient of
Mean deviation

Discussion:
RANGE
Range is the difference between the highest and the lowest
observations in a set. If x1, x2, x3.........., xn are the values of n
observations in a sample. Generally it is denoted by R.
Symbolically, Range,
R = max(x1, x2, x3.........., xn) - min (x1, x2, x3.........., xn).
2
On the other hand the range is define by R = L – S, where L =
Largest value, and S = Smallest value.
A relative measure known as coefficient of range is given as,
𝐿−𝑆
Co- efficient of Range, C. R = 𝐿+𝑆 ×100.
Lesser the range or coefficient of range, better the result.
Properties:
(i) It is the simplest measure and can easily be understood.
(ii) Besides the above merit, it hardly satisfies any property of a good
measure of dispersion e.g. it is based on two extreme values only,
ignoring the orders. It is not liable to further algebraic treatment.
Example-1: The following are the prices of shares of a company
from Monday to Saturday:
Day Price (Taka)
Monday 200
Tuesday 210
Wednesday 208
Thursday 160
Friday 220
Saturday 250
Calculate range and coefficient of range.
Solution-1: We known that
Range, R = L – S
Where, L = Largest value and S = Smallest value
= 250 taka and = 160 taka
R = 250-160 = 90 taka
Co-efficient of range,
C. R =
𝐿−𝑆 250−160 90
𝐿+𝑆
× 100 = 250+160 × 100 = 410 × 100 = 0. 219 × 100 = 21. 90
This is the required result of our given information.

Note: In a frequency distribution, range is calculated by taking the


difference between the lower limit of the lowest class and the upper
limit of the highest class.

3
Example2: The following are the prices of shares of a company
Profits (taka.lakhs) No. of observations
10-20 8
20-30 10
30-40 12
40-50 8
50-60 4
60-70 2
Calculate coefficient of range.
𝐿−𝑆
Solution: Co-efficient of range, C.R = 𝐿+𝑆 × 100
70−10 60
= 70+10 ×100= 80 × 100
= 0.75×100 = 75
This is the required result of our given information.
Advantages of Range
1. Among all the methods of studying variation, range is the
easy to understand and the easy to calculate.
2. Range takes minimum time to calculate the value of range.
3. If one is interested in getting a quick rather than a very
accurate picture of variation, one may determine range.

Limitation of range
1. Range is not based on each and all observation of a
distribution.
2. It is subject to fluctuations of considerable magnitude from
sample to sample.
3. Range cannot be determined in case of open-end
distributions.
4. Range cannot tell us anything about the character of the
distribution within two extreme observations.
Application of the range
There are the different applications of the range. We can describe
the application of range in the following:
1. Range can use in the Quality control;
4
2. It is used in the share prices;
3. It is used in the Weather forecasts and etc.
Interpretation of the range, R
The R is no more than a rough measure of dispersion. It gives a
comprehensive value for the data in the sense that it includes the
limits within which all of the items occurred. The range can be
interpreted as an intensive measure of variability except in very
small samples.

Quartile deviation (QD) or Semi-inter-Quartile Range:


Definition: The half of the difference between third quartile and first
quartile is called Quartile deviation. If Q1 and Q3 denote the 1st and
3rd quartile, respectively then to obtain a measure of dispersion we
find the distance between Q3 and Q1 (the inter-quartile range) which
can be viewed as the width of the middle half of the data. This is
because of the fact that the quartiles divide the whole data set into
four parts; each containing about one quarter of the data values and it
is represented by Q.D and defined by
𝑄3−𝑄1
Q.D = 2 .
Also the coefficient of quartile deviation is given by the formula,
𝑄3−𝑄1
Coeff. Of Q.D = 𝑄3+𝑄1 × 100

Advantages of QD
1. The QD is based on the middle 50% of a distribution and is
complementary to the median.
2. The QD is easy to compute and easy to understand.
3. QD is not affected by the extreme values.
4. QD is superior to the range as a rough measure of dispersion.

Limitations of QD
1. Quartile deviation ignores 50% items, i.e., the first 25% and
the last 25%.
2. The value of QD does not depend upon every observation it
cannot be regarded as a good method of measuring variation.
3. QD is not capable of mathematical manipulation.
5
4. Its value is very much affected by sampling fluctuations.
5. It is in fact not a measure of dispersion as it really does not
show the scatter around as average but rather a distance on a
scale,.i
Example 3. The values of Q1, Q2 and Q3 are worked out for a
company are Q1=174.90, Q2 = 190.23 and Q3 =203.83. Find the
quartile deviation and Co-efficient of quartile deviation
.

Solution: Given that Q1=174.90, Q2 = 190.23 and Q3 =203.83.


We known that, quartile deviation,
𝑄3−𝑄1 230.83−174.90 28.93
Q.D = 2 = 2
= 2 = 14. 465
Coefficient of Q.D =
𝑄3−𝑄1 203.83−174.90 28.93
𝑄3+𝑄1
× 100 = 203.83+174.90 × 100 = 378.73
× 100 = 0. 076 × 100 =
This is the required result of our given information.

Mean deviation:
Mean deviation or Average deviation is obtained by calculating the
absolute deviations of each observation from median (or mean), and
then averaging these deviations by taking their arithmetic mean.
Definition:
Mean deviation is the average of the absolute deviations taken from
a central value, generally the mean or median. It is denoted by M.D
MD for raw or ungrouped data:
Let x1, x2, ........,xn are n observations and their arithmetic mean 𝑥,
then the mean deviation is MD is defined by;
𝑛
∑ |𝑋𝑖−𝑋|
M.D (𝑥) = 𝑖=1
𝑛
, when taken from the mean
𝑛
∑ |𝑋𝑖−𝑀𝑒|
M.D (Me) = 𝑖=1
𝑛
, when taken from the median.
Computation of Mean Deviation (ungrouped data):
In the deviation method of measuring scatter, the following steps
are to be taken:
1. Compute the mean or median of the observations.i.e.,
6
𝑋 or Me.
2. Find the deviations of each observation from the

mean. i.e., ∑|𝑥𝑖 − 𝑥|. or ∑|𝑥𝑖 − 𝑀𝑒|

3. Compute MD = ∑|𝑥𝑖 − 𝑥| or ∑|𝑥𝑖 − 𝑀𝑒|.


MD for grouped data:
Let x1,x2,...........,xn denote the mid-values of n classes and f1,f2,.....,fn
are the corresponding frequencies, then the MD is defined by;
𝑛
∑ 𝑓𝑖|𝑋𝑖−𝑋|
M.D (𝑥) = 𝑖=1 𝑁
, when mean deviation from mean.
Where, fi is the class frequency and xi be the mid-value for every
class, N is the total frequency.
𝑛
∑ 𝑓𝑖|𝑋𝑖−𝑀𝑒|
M.D (Me) = 𝑖=1
𝑁
, when mean deviation from median.
The relative measure corresponding to the MD, called the coefficient
of MD, is obtained, by dividing MD by the particular mean used in
computing MD. Thus, if MD has been computed from median, the
coefficient of MD shall be obtained by dividing MD by the median
or mean.
Coefficient of mean deviation,
𝑀𝐷
C.MD = × 100 when, MD from mean.
𝑋
If median has been used while calculating the value of mean
deviation, in such a case coefficient of mean deviation shall be
obtained by dividing mean deviation by the median. A co-efficient of
mean deviation;
𝑀𝐷
C.MD = 𝑚𝑒𝑑𝑖𝑎𝑛 × 100 when, MD from median.

Computation of Mean Deviation (for grouped data)


Following steps are involved in calculating the mean deviation from
mean or median:

7
1. Median or mean of the series is calculated .i.e., 𝑋 or Me.
2. Deviations of the items from median or mean are ascertained
ignoring plus and minus signs.
3. Deviations computed above are multiplied by the respective
𝑛 𝑛
frequencies. i.e., ∑ 𝑓𝑖|𝑥𝑖 − 𝑥|. or ∑ 𝑓𝑖|𝑥𝑖 − 𝑀𝑒|.
𝑖=1 𝑖=1
𝑛 𝑛
4. ∑ 𝑓𝑖|𝑥𝑖 − 𝑥|. or ∑ 𝑓𝑖|𝑥𝑖 − 𝑀𝑒|. is divided by the number of
𝑖=1 𝑖=1
items.
The quotient obtained shall be the value of mean deviation.

Properties:
1. Mean deviation removes one main objection of the earlier
measures, that it involves each value of the set.
2. It is not affected much by extreme values.
3. It has no relationship with any of the other measures of
dispersion.
4. Its main drawback is that algebraic negative signs of the
deviations are ignored which mathematically unsound.
5. Mean deviation is minimum when the derivations are taken
from median.
6. Mean deviation is independent in origin but depend on scale.
Interpretation of the mean deviation:
1. The mean deviation may help us to find the percentage of
observations falling in a range of Average ±mean deviation.

2. If we consider a normal distribution for which AM = Me =Mo


and which is symmetric, then the percentage of values falling
in the range (AM±MD) is the same as that of falling in the
range (Me ±MD).

3. If the MD is comparatively small, then more than half of the


items in the data fall within a small range around the average.

8
This concentration would mean compactness of the
distribution.
Application of the MD:
1. The application of the MD is overshadowed to a large extent
by the use of the standard deviation (SD).But the
computation of the MD is less difficult.
2. For purpose of interpreting the significance of a series of
ratios of an item it is the most valuable. Because of its
simplicity in meaning and computation, it is especially
effective in reports presented to the general public or to
groups not familiar with statistical methods.
Advantage of mean deviation:
1. The outstanding advantage of the MD is its relative simplicity.
2. It is simple to understand and easy to compute.
3. It is based on each and every observation of the data.
Consequently change in the value of any observation would
change the value of average deviation.
4. Mean deviation is less affected by the values of extreme
observation.
5.Deviation are taken from a central value, comparison about
formation of different distributions can easily ve made.
Limitations of mean deviation:
1. The greatest drawback of this method is that algebraic signs
are ignored while taking the deviations of the items.
2. This method may not give us accurate results. The reason is
that mean deviation gives us best results when deviations are
taken from median. But median is not a satisfactory measure
when the degree of variability in a series is very high.
3. It is not capable of further algebraic treatment.
4. It is rarely used in sociological and business studies.

Coefficient of mean deviation:


Coefficient of mean deviation is the relative measures of dispersion.
It is the ratio of the mean deviation and arithmetic mean product
into 100, we can defined as

9
𝑀𝐷
Coefficient of mean deviation, C.MD = × 100
𝑋
Example 4.
Calculate mean deviation and coefficient of mean deviation taken the
from mean from the following data:
Sales(in 15-19 19-23 23-27 27-31 31-35 35-39
thousand $)
No. of days 8 59 47 23 6 4

Solution: We want to calculate Mean deviation from the mean.


𝑛
∑ 𝑓𝑖|𝑋𝑖−𝑋|
M.D (𝑥) = 𝑖=1 𝑁
, when mean deviation from mean.
Where, fi = class frequency, Xi = Mid-value; 𝑋= Mean of X and
N = Total no. of observations.
The calculations are shown in the table given below.
Calculation table of mean deviation
Class Mid- frequency fi xi |𝑥𝑖 − 𝑥| fi|𝑥𝑖 − 𝑥|
intervals value(xi) (fi)

15-19 17 8 136 7.24 57.92


19-23 21 59 1239 3.24 191.16
23-27 25 47 1175 0.76 35.72
27-31 29 23 667 4.76 109.48
31-35 33 6 198 8.76 52.56
35-39 37 4 148 12.76 51.04
Total N = 147 3563 497.88
𝑛
∑ 𝑓𝑖𝑥𝑖
3563
Where, 𝑋= 𝑖=1𝑁 = 147
= 24. 27
Mean deviation about mean by the formula is
𝑛
∑ 𝑓𝑖|𝑥𝑖−𝑥|
497.88
M.D (𝑥) = 𝑖=1
𝑁
= 147
= 3. 39taka.
Coefficient of mean deviation;
10
𝑀.𝐷 3.39
C.M.D = ×100 = 24.27
× 100 = 0. 1397 × 100 = 13. 97
𝑋
Thus the mean sales are $24.27 thousand per day and the mean
deviation of sales is $13.97 % thousand.

Problem 5.
Calculate mean deviation from mean from the following data:
Sales(in 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60
thousand $)
No. of days 3 6 11 3 2
Also calculate the co-efficient of mean deviation.
Solution: We want to calculate Mean deviation from the mean.
Calculation table of mean deviation
Sales (in Mid- frequency fi xi |𝑥𝑖 − 𝑥| fi|𝑥𝑖 − 𝑥|
thousand value( (fi)
$) xi)
10 -20 15 3 45 18 36
20 -30 25 6 150 08 48
30 - 40 35 11 385 02 22
40 - 50 45 3 135 12 36
50 - 60 55 2 110 22 44
Total N = 25 825 186
𝑛
∑ 𝑓𝑖|𝑋𝑖−𝑋|
.D (𝑥) = 𝑖=1 𝑁
, when mean deviation from mean.
Where, fi = class frequency, Xi = Mid-value; 𝑋= Mean of X and
N = Total no. of observations.
825
Where, 𝑋= 25 = 33
Mean deviation about mean by the formula is
∑𝑓𝑖|𝑥𝑖−𝑥|
186
M.D (𝑥) = 𝑁
= 25
= $7. 44

11
Coefficient of mean deviation;
𝑀.𝐷 7.44
C.M.D = ×100 = 33 × 100 = 0. 2255 × 100 = 22. 55
𝑋
Thus the mean sales are $33 thousand per day and the mean
deviation of sales is $ 22.55 % thousand.

VARIANCE:
Definition: The variance is the average of the squares of the
deviations taken from mean i.e., sum of the square deviation and
dividing by the number of observations is called variance.

Variance for ungrouped data


Let X1, X2,........,Xn be the measurements on n population units, the
population variance is denoted by σ2 and defined by
𝑛
1
σ2 = 𝑛 ∑ (Xi - 𝑋)2, for i = 1, 2, .........., n.
𝑖=1
𝑛 𝑛
1
⇒σ2 = 𝑛
{ ∑ Xi2 - ( ∑ 𝑋𝑖)2/n}; where 𝑋is the population mean.
𝑖=1 𝑖=1
The sample variance of the set x1,x2, .........,xn of n observations is
given by the formula,
𝑛
1
S2 = 𝑛−1
∑ (xi - 𝑥)2
𝑖=1

( )
𝑛 𝑛
1 ⎰ 1 ⎱
= ∑ 𝑥𝑖2 − ∑ 𝑥𝑖 2 for i = 1, 2, ........., n.
𝑛−1 ⎱𝑖=1 𝑛
𝑖=1

𝑛
1
where 𝑥 = 𝑛
∑ 𝑥𝑖.
𝑖=1

Variance for grouped data:


Let x1,x2,...........,xn denote the mid-values of n classes and f1,f2,.....,fn
are the corresponding frequencies and the variance is defined by

12
𝑛
1
σ2 = 𝑁
∑ fi (Xi -𝑋)2 for i = 1, 2, .......n. Where, fi is the class
𝑖=1
frequency and xi be the mid-value for every class, N is the total
𝑛
𝑛
frequency, 𝑋 = ∑ 𝑓𝑖𝑋𝑖 and N = ∑ 𝑓𝑖
𝑖=1 𝑖=1

( )
𝑛 𝑛
1 ⎰ 1 ⎱
σ2 = ∑ 𝑓𝑖𝑋𝑖2 − ∑ 𝑓𝑖𝑋𝑖 2
𝑁 ⎱𝑖=1 𝑁
𝑖=1

If the observation xi occurs fi times for i = 1, 2, ........., n, then the
sample variance,
𝑛
2 1
S = 𝑛−1
∑ fi (xi - 𝑥)2
𝑖=1
⎰𝑛
( )
𝑛
1 1 ⎱
= ∑ 𝑓𝑖𝑥𝑖2 − ∑ 𝑓𝑖𝑥𝑖 2 for i = 1, 2, .........,
𝑛−1 ⎱𝑖=1 𝑛
𝑖=1

n.
𝑛 𝑛
1
where 𝑥 = 𝑛
∑ 𝑓𝑖𝑥𝑖. and n = ∑ 𝑓𝑖.
𝑖=1 𝑖=1
Properties of variance:
1. The variance has mostly removed the lacunae which are
present in the measures of dispersion given before it.
2. The main disadvantage of variance is, that its unit is square
of the unit of measurement of variate values. For clarity, say,
the variable X is measured in ms, the unit of variance is m2.
Generally this value is large and makes it difficult to decide
about the magnitude of variation.
3. The variance gives more weightage to the extreme values as
compared to those which are near to mean value, because the
difference is squared in variance.

Theorem: Prove that the variance is independent in origin but


depend on scale.

13
Proof: Let us consider x1, x2, .............,xn are n observations and their
mean is denoted by 𝑥. The variance is denoted by Sx2 and defined by
𝑛 2

Sx2 =
1
𝑛 ( )
∑ 𝑥𝑖 − 𝑥 ..............................(i)
𝑖=1
𝑥𝑖−𝑎
Let us consider ui = 𝑐
be a new variate, where a is origin and c
𝑥𝑖−𝑎
is scale. We have ui = 𝑐
⇒ 𝑥𝑖 − 𝑎 = 𝑐𝑢𝑖⇒ 𝑥𝑖 = 𝑎 + 𝑐𝑢𝑖∵𝑥 = 𝑎 + 𝑐𝑢
Putting the value in (i), we have
𝑛 2

S =
x
2 1
𝑛
∑ 𝑥𝑖 − 𝑥
𝑖=1
( )
𝑛 2

=
1
𝑛 (
∑ 𝑎 + 𝑐𝑢𝑖 − 𝑎 − 𝑐𝑢
𝑖=1
)
𝑛 2 𝑛 2

=
1
𝑛 (
∑ 𝑐𝑢𝑖 − 𝑐𝑢 =
𝑖=1
) 2 1
𝑐. 𝑛 ∑ 𝑢𝑖 − 𝑢
𝑖=1
( ) 2
= 𝑐 . 𝑆𝑢
2

Therefore, the variance is independent in origin but depend on scale.


(Proved)
Example: Find the variance for the first n natural numbers.
Solution: We know that, set of the first n natural numbers
x: {1, 2, 3, ...............,n}.
So, the variance of x is defined by
𝑛 𝑛 2
𝑛 2 ∑ 𝑥𝑖
2
∑ 𝑥𝑖
Sx2 =
1
𝑛 (
∑ 𝑥𝑖 − 𝑥 = 𝑖=1𝑛
𝑖=1
) − ⎛ 𝑖=1𝑛 ⎞
⎝ ⎠
𝑥1+𝑥2+..........+𝑥𝑛 2
2 2 2

=
𝑥1 +𝑥2 +..........+𝑥𝑛
𝑛
− ( 𝑛 )
1+2+..........+𝑛 2
2 2 2
1 +2 +........+𝑛
=
𝑛
− ( 𝑛 )
𝑛(𝑛+1) 2
2
=
𝑛(𝑛+1)(2𝑛+1)
6𝑛
− { 2𝑛
= }
(𝑛+1)(2𝑛+1)
6

(𝑛+1)
4
=
(𝑛+1)
2 { 2𝑛+1
3

𝑛+1
2 }
14
2
(𝑛+1) (𝑛−1) 𝑛 −1
= 2
× 6
= 12
So, the variance of the first n natural numbers.

Example: Find the variance for the 19th natural number.


Solution: We know that, the variance of the first n natural numbers is
2
𝑛 −1
= 12
2
𝑛 −1
So, the variance of the 19th natural numbers = 12
2
(19) −1
= 12
361−1 360
= 12
= 12
= 30.
Example: Find the variance for the 25th natural number.
Solution: We know that, the variance of the first n natural numbers
is
2
𝑛 −1
= 12
2
𝑛 −1
So, the variance of the 19th natural numbers = 12
2
25 −1 625−1 624
= 25
= 25
= 25
= 24. 96 (Answer).
Example: Find the variance for the following information
4001, 4002, 4003, ................4050.

Solution: Let the numbers, xi = 4001, 4002, 4003, ................4050


and ui = xi - 4000
Therefore, ui = 1, 2, 3, ...........,50
So, the variance is independent of origin and the variance of the first
2
𝑛 −1
n natural numbers = 12
Therefore, the required variance
2 2
𝑛 −1 50 −1 2499
= 12
= 12
= 12
= 208. 25 (Answer).

15
STANDARD DEVIATION:
There is another method of summing up deviations from measure of
central tendency and finding a measure of dispersion of the data. It is
standard deviation. Standard deviation considered superior to other
measures of dispersion because of its advantages in mathematically
representing the variability, which is very important for interpreting
statistically data.

Standard deviation is a measure of absolute dispersion. It is a high


degree of uniformity of the observations as well as homogeneity in
the first series. The standard deviation may be defined as the root of
the mean of squares of the deviations of individual items from the
arithmetic mean.

Definition: The positive square root of the variance is called


standard deviation. It is denoted by S.D. = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ2=σ.

Properties
1. Standard deviation is considered to be the best measure of
dispersion and is used widely.
2. There is however one difficulty with it. If the unit of
measurement of variances of two series is not the same, then
their variability can not be compared by comparing the
values of standard deviation.
3. Standard deviation must be a positive quantity.
4. It is based on all the observations and is readily understood.
5. It is more difficult to compute than the other measures of
dispersion but is easy to use mathematically.
16
6. It has relatively a small sampling error.
7. Its main advantage is that it is amenable to algebraic
treatment and comparatively stable under sampling
fluctuations.
8. Standard deviation is independent of change of origin but not
scale.
9. The variance is the minimum of all mean squared deviations
(MSD) and the SD is the minimum of all root mean squared
deviation (RMSD).
10. If 𝑋 and S denote the mean and standard deviation,
respectively of n non-negative quantities x1, x2,........,xn then,
𝑛 − 1≥ 𝑆
11. For a symmetrical distribution, the following area
relationships hold good.
Mean ± 1σ covers 68.27% observations.
Mean ± 2σ covers 95.45% observations.
Mean ± 3σ covers 99.73% observations.
Advantage of Standard deviation
1. The standard deviation is the best measure of the dispersion.
2. It is possible to calculate the combine standard deviation of
two or more groups.
3. For comparing the variability of two or more distributions
coefficient of variation is considered to be most appropriate
and this measure is based on mean and standard deviation.
4. SD is most prominently used in further statistical work.

Limitation of Standard deviation:


1. As compared to other measure it is difficult to compute.
17
2. It gives more weight to extreme values and less to those
which are near the mean.

Application of Standard deviation


1. In sampling, the SD is the most useful measure.
2. It is also used widely in basic statistics and in many statistical
operations.
3. The SD can be used to compare the variability or degree of
uniformity of two or more data sets.
Theorem: The standard deviation is independent in origin but
depend on scale.
Proof: Let us consider x1, x2, .............,xn are n observations and their
mean is denoted by 𝑥. The standard deviation is denoted by S.D or σ
or Sx and defined by

𝑛 2

SD =
1
𝑛 ( )
∑ 𝑥𝑖 − 𝑥 ..............................(i)
𝑖=1
𝑥𝑖−𝑎
Let us consider ui = 𝑐
be a new variate, where a is origin and c
𝑥𝑖−𝑎
is scale. We have ui = 𝑐
⇒ 𝑥𝑖 − 𝑎 = 𝑐𝑢𝑖⇒ 𝑥𝑖 = 𝑎 + 𝑐𝑢𝑖∵𝑥 = 𝑎 + 𝑐𝑢
Putting the value in (i), we have
𝑛 2

SD =
1
𝑛
𝑖=1
(
∑ 𝑎 + 𝑐𝑢𝑖 − 𝑎 − 𝑐𝑢 )

18
𝑛 2
=
1
𝑛
∑ 𝑐𝑢𝑖 − 𝑐𝑢
𝑖=1
( )
𝑛 2 𝑛 2
=
1
𝑛
2
𝑐 ∑ 𝑢𝑖 − 𝑢
𝑖=1
( ) = 𝑐
1
𝑛
𝑖=1
∑ 𝑢𝑖 − 𝑢( ) = 𝑐.  𝑆𝐷(𝑥)

Therefore, the standard deviation is independent in origin but depend


on scale.
(Proved)
Theorem: If the n term of the positive observation and their
arithmetic mean 𝑋 and standard deviation S then show that
𝑋 𝑛 − 1 > 𝑆.
Proof: Let us consider x1, x2, ……..,xn are n term positive
observations and their arithmetic mean 𝑋 and standard deviation S;
then we have
𝑛 𝑛
𝑋 = ∑ 𝑥𝑖 ⇒ 𝑛𝑥 = ∑ 𝑥𝑖 and
𝑖=1 𝑖=1

𝑛 𝑛 2
2
∑ 𝑥𝑖 ∑ 𝑥𝑖
S= 𝑖=1
𝑛
− ⎛ 𝑖=1𝑛 ⎞
⎝ ⎠ 2 2

( ) ⇒ 𝑛𝑆 ( )
𝑛 𝑛 𝑛
2
∑ 𝑥𝑖 ∑ 𝑥𝑖 𝑛 ∑ 𝑥𝑖
2 𝑖=1 𝑖=1 2 2 𝑖𝑛=1
⇒𝑆 = 𝑛
− 2 = ∑ 𝑥 𝑖− 𝑛
𝑛 𝑖=1

Problem5: Find the variance and standard deviation from the


weekly wages of ten workers working in a factory:
Workers Weekly wages (TK.)
A 1320
B 1310
C 1315
D 1322

19
E 1326
F 1340
G 1325
H 1321
I 1320
J 1331

𝑛
2 1
Solution: We know that the variance, σ = 𝑛
∑ (𝑥𝑖 − 𝑥)2 and
𝑖=1
Standard deviation, S.D. = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ2=σ.
Calculations of Variance and Standard deviation by using the above
information;
Workers Weekly
wages (xi) (xi-𝑥) (xi-𝑥)2
A 1320 -3 9
B 1310 -13 169
C 1315 -8 64
D 1322 -1 1
E 1326 +3 9
F 1340 +17 289
G 1325 +2 4
H 1321 -2 4
I 1320 -3 9
J 1331 +8 64
n =10
∑ 𝑥𝑖=13230 (xi-𝑥) (xi-𝑥)2 = 622

We have,
∑𝑥𝑖
13230
𝑋= 𝑛
= 10
= 1323 taka.
𝑛
2 1
Variance, σ = 𝑛
∑ (𝑥𝑖 − 𝑥)2 from table we get,
𝑖=1

20
622
= 10
= 62.20 taka.

Hence, the variance for our given information is 62.20 taka. So, the

standard deviation S.D. = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ2=σ = 62. 20=7.89 taka.


Example 6: Find the variance and standard deviation from an
analysis of production rejects results in the following figures:
No. of rejects per No. of operators
operator
21-25 5
26-30 15
31-35 28
36-40 42
41-45 15
46-50 12
51-55 3
𝑛
1
Solution: We know that the variance, σ2 = 𝑁
∑ 𝑓𝑖(𝑥𝑖 − 𝑥)2 and
𝑖=1
Standard deviation, S.D. = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ2=σ.
Calculation table:
No. of Mid- fi xi (xi-𝑥) (xi-𝑥)2 fi(xi-𝑥)2
rejects per value(xi (fi)
)
operator
21-25 23 5 115 -13.96 194.88 974.40
26-30 28 15 420 -8.96 80.28 1204.20
31-35 33 28 924 -3.96 15.68 439.04
36-40 38 42 1596 1.04 1.08 45.36

21
41-45 43 15 645 6.04 36.48 547.20
46-50 48 12 576 11.04 121.88 1462.56
51-55 53 3 159 16.04 257.28 771.84
Total N= 4,43 5444.60
120 5
We have,

∑𝑓𝑖𝑥𝑖
4435
𝑋= 𝑁
= 120
= 36. 96.
𝑛
1
variance, σ2 = 𝑁
∑ 𝑓𝑖(𝑥𝑖 − 𝑥)2 from table we get,
𝑖=1

5444.60
= 120
= 45.37.

Hence, the variance for our given information is 45.37. So, the

standard deviation S.D. = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ2=σ = 45. 37=6.74.


COEFFICIENT OF VARIATION (C.V):
Coefficient of variation is more useful when the two distributions are
entirely different and the units of measurement are also different.
When the relative variation is stated in terms of the arithmetic mean
and the standard deviation, the resulting percentage is known as the
coefficient of variation or coefficient of variability.
Definition: Coefficient of variation of a series of variate values is
the ratio of the standard deviation to the mean multiplied by 100.
If σ is the standard deviation and 𝑋is the mean of the set of values,
the coefficient of variation is,
σ
C.V = × 100;
𝑋

22
Coefficient of variation is of great practical significance and is the
best measures of comparing the variability of two series or two
groups. It shows whether the items included in a series are consistent
or not as well be evident from the examples solved hereafter. The
series or groups for which the coefficient of variation is greater is
said to be more variable or less consistent. On the other hand, the
series for which the coefficient of variation is less is said to be less
variables or more consistent.
[This measure was given by Professor Karl Pearson.] By using the
above Example 6 information, we can calculate the coefficient of
variation. We have,
σ
C.V = × 100; Here, σ = 6.74 and 𝑋=36.96, then
𝑋
6.74
= 36.96 × 100
=18.24 (Answer).
Example: A sample of six items is taken from the production of a
company. Length and weight of the six items are given below:
Length (inches): 3 8 10 12 14 15
Weight (ounces): 9 11 12 14 16 17
Find out co-efficient of variation for length and weight.
σ
Solution: We know that C.V = × 100,
𝑋
𝑛
𝑛 ∑ 𝑥𝑖
1
Where σ2 = 𝑛 ∑ (𝑥𝑖 − 𝑥)2 𝑋 = 𝑖=1
𝑛
, n = no. of observations and σ2
𝑖=1

is the variance & σ is the standard deviation.


Now, we want to construct a calculation table by using given data.
Calculation table:

23
Length (xi) Weight (yi)
(inches) (xi - 𝑥) (xi - 𝑥)2 ounces (yi - 𝑦) (yi - 𝑦)2
3 -7.33 53.729 9 -4.170 17.389

8 -2.33 5.429 11 -2.170 4.709


10 -0.33 0.109 12 -1.170 1.369
12 1.67 2.789 14 0.830 0.689
14 3.67 13.469 16 2.830 8.009
15 4.67 21.809 17 3.830 14.669
6 6 6 6
∑ 𝑥𝑖 =62 ∑ (xi - 𝑥 ∑ 𝑦𝑖=79 ∑ (yi- 𝑦)2
𝑖=1 𝑖=1 𝑖=1 𝑖=1

𝑋=10.33 )2 = 𝑦=13.17 =46.834


97.334

For length:
𝑛
∑ 𝑥𝑖
So, 𝑋 = 𝑖=1
𝑛
, n = no. of observations = 6
62
= 6
= 10.33
𝑛
2 1
The variance of length is σ x = 𝑛
∑ (𝑥𝑖 − 𝑥)2, from the
𝑖=1
2 1
calculation table we can get, σ = x 6
× 97. 334 = 16. 222 and the
standard deviation, σx = 16. 222= 4.028
Therefore the coefficient of variation for length is
σ𝑥 4.028
C.V = × 100 = 10.33 × 100= 38.99
𝑥
24
For weight:
𝑛
∑ 𝑦𝑖
So, 𝑦= 𝑖=1
𝑛
, n = no. of observations = 6
79
= 6 = 13.17
𝑛
1
The variance of length is σy2 = 𝑛 ∑ (𝑦𝑖 − 𝑦)2, from the
𝑖=1

1
calculation table we can get, σy2 = 6
× 46. 834 = 7. 806 and the

standard deviation, σy = 7. 806=2.794


Therefore the coefficient of variation for length is
σ𝑦 2.794
C.V = × 100 = 13.17
× 100= 21.22
𝑦

Comment: The coefficient of variation for length is greater than the


coefficient of variation for weight. So, the length of variation is less
consistent than the variation of weight.

Example: The following table gives the fluctuations in the prices of


shares in taka of two companies X and Y. Find out which of them
shows greater variability. Comment on the result.
Shares X: 318 322 325 320 324 315 318 329
Shares Y: 2545 2524 2544 2533 2565 2535 2567 2559

Solution: we know that


σ
C.V = × 100
𝑋
where σ = Standard deviation, 𝑋= mean of x
We want to construct a calculation table by using given data.

Shares - X Shares-Y
25
xi (xi -𝑥) (xi -𝑥)2 yi yi - 𝑦 (yi - 𝑦)2
318 -3.38 11.424 2545 -1.5 2.25
322 0.62 0.384 2524 -22.5 506.25
325 3.62 13.104 2544 -2.5 6.25
320 -1.38 1.904 2533 -13.5 182.25
324 2.62 6.864 2565 18.5 342.25
315 -6.38 40.704 2535 -11.5 132.25
318 -3.38 11.424 2567 20.5 420.25
329 7.62 58.064 2559 12.5 156.25
8 8 8 8
2
∑ 𝑥𝑖 = 257 ∑ (xi - 𝑥) ∑ 𝑦𝑖 = ∑ (yi - 𝑦)2
𝑖=1 𝑖=1 𝑖=1 𝑖=1
= 143.872 =1748.25

Calculation: For Shares X:


𝑛
∑ 𝑥𝑖
Mean of 𝑋 = 𝑖=1
𝑛
, n = no. of observations = 8
2571
= 8
= 321. 375
𝑛
2 1 143.872
Variance of X; σ = x 𝑛
∑ (𝑥𝑖 − 𝑥)2 = 8
= 17. 984
𝑖=1
Standard deviation, σx = 17. 984 = 4.241
So, the coefficients of variation for shares X,
σ𝑥
C.V(X) = × 100
𝑋
4.241
= 321.375
× 100
= 1.32

For Shares Y:
𝑛
∑ 𝑦𝑖
Mean of Y; 𝑌= 𝑛
𝑖=1
, n = no. of observations = 8
20372
= 8
= 2546. 5

26
𝑛
1 1748.25
Variance of X; σy2 = 𝑛 ∑ (𝑦𝑖 − 𝑦)2 = 8
= 218. 531
𝑖=1
Standard deviation, σy = 218. 531 = 14.783
So, the coefficients of variation for shares Y,
σ𝑦
C.V(X) = × 100
𝑌
14.783
= 2546.5
× 100 = 0.581
Comment: Since it is evident that prices of shares for company Y
are much larger than those for company X, the two standard
deviations are not directly comparable. As a share X has more
coefficient of variation, so, it shows greater variability or shares Y is
more stable. Thus the coefficient of variation can be employed for
comparing the relative consistency of the prices of shares of two or
more companies. This will help a genuine investor in selecting share
the price of which is relatively stable. Thus shares which are more
consistent in the fluctuation of prices will be preferred by him.

POOLED OR COMBINED VARIANCE


Let us consider two groups consisting of N1 and N2 observations
respectively. Suppose the means of the groups are 𝑋1&𝑋2 and the
variances are σ12 and σ22 respectively. We know by formula for the
combined mean of both the groups is

𝑋 = 𝑁1𝑋1+𝑁2𝑋2 ;
𝑁1+𝑁2

Then combined variance of the two groups is given by the formula,

27
2 2 2 2
𝑁1{σ1+(𝑋1−𝑋) }+𝑁2{σ1+(𝑋2−𝑋) }
2
σ =
𝑁1+𝑁2

Example 7: The mean and variance of scores earned by two groups,


one of the boys and the other of the girls, on computation yielded the
following results:
N1= 62 𝑋1= 108.2 σ12 = 524.41
N2= 45 𝑋2= 105.4 σ22 = 355.32. Find the combined variance.

Solution: Given that,


N1= 62 𝑋1= 108.2 σ12 = 524.41
N2= 45 𝑋2= 105.4 σ22 = 355.32.

Then their combine mean 𝑋 = 𝑁1𝑋1+𝑁2𝑋2 ; put the value


𝑁1+𝑁2
62×108.2+45×105.4
= 62+45
6,708.40+4,743.00
= 107
11451.4
= 107
= 107. 02
So, 𝑋 = 107.02.
Combined variance / pooled variance,
2 2
𝑁1{σ12+(𝑋1−𝑋) }+𝑁2{σ22+(𝑋2−𝑋) }
σ2= 𝑁1+𝑁2

62×{524.41+(108.2−107.02)2}+45×{355.32+(105.4−107.02)2}
= 62+45
62×(524.41+1.3924)+(355.32+2.6244)×45
= 107
48,707.24
= 107
= 455. 21
So, the Combined variance is 455.21 (Answer).
Again the combined standard deviation, σ =
𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑖 𝑎𝑛𝑐𝑒= 455. 21=21.34 (Answer).

28
Example: A sample of 45 values has mean 85 and standard
deviation 8. A second sample of 65 values from the same population
has mean 80 and standard deviation 13. Find the mean and standard
deviation of the combined sample of 110 values.

Solution: Let us consider 𝑋1 and 𝑋2 be two mean of the first and


second samples respectively,

𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
Where, 𝑋1 = 85 and 𝑋2= 80
N1 = 45 N2 = 65
σ1 = 8 σ2 = 13
Then the combined mean

𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
(85×45)+(80×65)
= 45+65
3825+5200 9025
= 110
= 110 = 82. 045
Combined standard deviation
2 2
𝑁1{σ12+(𝑋1−𝑋) }+𝑁2{σ22+(𝑋2−𝑋) }
σ2= 𝑁1+𝑁2
1
= 45+65
[45{82 +(85-82.045)2} + 65{132 +(80-82.045)2]
(45×72.732)+(65×173.182)
= 110
3272.940+11256.830 14529.770
= 110
= 110
= 132. 089

Example 8: In two factories A and B engaged in the same industry,


the average monthly wages and standard deviations are as follows:
Factory Average Monthly S.D of No. of
wage Wages (TK) Wages (TK) Earners
A 4600 500 100
B 4900 400 80
(i) Which factory A or B pays larger amount as monthly wages?
29
(ii) Which factory shows greater variability in the distribution of
wages?
(iii) What is the mean and standard deviation of all the workers in
two factories taken together?

Solution: (i) For finding out which factory A or B pays larger


amount as monthly wages: we have to compare the total wages.
Factory A: Total wage bill = 4600 ×100 = 4, 60,000 Tk
Factory B: Total wage bill = 4900 ×80 = 3, 92,000 Tk
Hence factory A pays larger amount as monthly wages. (Answer)

(ii) For determining which factory shows greater variation in the


distribution of wages, we have to compare coefficient of variation.
σ 500
C. V (factory A) = × 100 = 4600 × 100 = 10. 87
𝑋
σ 400
C. V (factory B) = × 100 = 4900
× 100 = 8. 16
𝑋
Since coefficient of variation is higher in factory A, hence factory A
shows greater variability in the distribution of wages.
(iii) Solution: Same as before Question 7.

Example: The first of two samples has 100 items with mean 15 and
standard deviation 3. If the whole group has 250 items with mean
15.6 and standard deviation 3.666, find the standard deviation of the
second group.
Solution: Let us consider 𝑋1 and 𝑋2 be two mean of the first and
second samples respectively,

𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
Where, 𝑋1 = 15 and 𝑋2= ?
N1 = 100 N2 = 150
σ1 = 3 σ2 =?
σ = Combined standard deviation = 3.666
30
The combined arithmetic mean is given by the following formula:

𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
(100×15)+(150×𝑋2)
⇒ 15. 6 = 100+150
⇒3900 = 1500 + 150 𝑋2
⇒150𝑋2 = 3900 - 1500
2400
𝑋2 = 150
= 16
𝑋2 = 16.
Therefore arithmetic mean of the second sample is 16.
The combined standard deviation of the whole group is given by the
following formula:
2 2 2 2
𝑁1{σ 1+(𝑋1−𝑋) }+𝑁2{σ 2+(𝑋2−𝑋) } 1
σ= 𝑁1+𝑁2
]2

2
(100×9)+100(15−15.6)2+15σ 2+150(16−15.6)2
⇒ 3.666 = 250

Squaring on both sides, we can get


2
900+36+150σ2 +24
2
⇒ (3.666) = 250
2
960+150σ
⇒ 13.44 = 250
2

⇒13.44 ×250 = 960 + 150 σ22


⇒ 150 σ22 = 3359.889 - 960
2400
⇒ σ22 = 150 = 16
⇒ σ2 = 4 .
Thus standard deviation of the second sample is 4.

31

You might also like