Dispersion
Dispersion
Dispersion
Measures of Dispersion
Introduction:
We have studied methods of finding the average or a measure of
central tendency of a frequency distribution and of locating certain
values, such as median, decile and percentiles. However, it is not
enough to compare averages. The extent and nature of the spread of
the values around the measure of central tendency is to be
ascertained. Thus it may be said that an average becomes meaningful
in the real sense only when it is accompanied by a measure of
dispersion. The study of dispersion is useful when-
1. it is necessary to know the structure of the frequency
distributions whose averages are similar.
2. it is necessary to know the structure of the frequency
distributions, whose formations are alike but which have
different means.
3. we want to have a comprehensive idea of the formation of a
series, it is necessary to know its dispersion.
1
4. For facilitating the use of other statistical measures.
Measures of Dispersion
Following are the measures of dispersion:
a) Absolute measures
(i) Range. (ii) Quartile deviation
(iii) Standard deviation (iv) Mean deviation
b) Relative measures
(i) Co- efficient of Range (ii) Co- efficient of
quartile deviation
(ii) Co- efficient of Variation (iv) Co- efficient of
Mean deviation
Discussion:
RANGE
Range is the difference between the highest and the lowest
observations in a set. If x1, x2, x3.........., xn are the values of n
observations in a sample. Generally it is denoted by R.
Symbolically, Range,
R = max(x1, x2, x3.........., xn) - min (x1, x2, x3.........., xn).
2
On the other hand the range is define by R = L – S, where L =
Largest value, and S = Smallest value.
A relative measure known as coefficient of range is given as,
𝐿−𝑆
Co- efficient of Range, C. R = 𝐿+𝑆 ×100.
Lesser the range or coefficient of range, better the result.
Properties:
(i) It is the simplest measure and can easily be understood.
(ii) Besides the above merit, it hardly satisfies any property of a good
measure of dispersion e.g. it is based on two extreme values only,
ignoring the orders. It is not liable to further algebraic treatment.
Example-1: The following are the prices of shares of a company
from Monday to Saturday:
Day Price (Taka)
Monday 200
Tuesday 210
Wednesday 208
Thursday 160
Friday 220
Saturday 250
Calculate range and coefficient of range.
Solution-1: We known that
Range, R = L – S
Where, L = Largest value and S = Smallest value
= 250 taka and = 160 taka
R = 250-160 = 90 taka
Co-efficient of range,
C. R =
𝐿−𝑆 250−160 90
𝐿+𝑆
× 100 = 250+160 × 100 = 410 × 100 = 0. 219 × 100 = 21. 90
This is the required result of our given information.
3
Example2: The following are the prices of shares of a company
Profits (taka.lakhs) No. of observations
10-20 8
20-30 10
30-40 12
40-50 8
50-60 4
60-70 2
Calculate coefficient of range.
𝐿−𝑆
Solution: Co-efficient of range, C.R = 𝐿+𝑆 × 100
70−10 60
= 70+10 ×100= 80 × 100
= 0.75×100 = 75
This is the required result of our given information.
Advantages of Range
1. Among all the methods of studying variation, range is the
easy to understand and the easy to calculate.
2. Range takes minimum time to calculate the value of range.
3. If one is interested in getting a quick rather than a very
accurate picture of variation, one may determine range.
Limitation of range
1. Range is not based on each and all observation of a
distribution.
2. It is subject to fluctuations of considerable magnitude from
sample to sample.
3. Range cannot be determined in case of open-end
distributions.
4. Range cannot tell us anything about the character of the
distribution within two extreme observations.
Application of the range
There are the different applications of the range. We can describe
the application of range in the following:
1. Range can use in the Quality control;
4
2. It is used in the share prices;
3. It is used in the Weather forecasts and etc.
Interpretation of the range, R
The R is no more than a rough measure of dispersion. It gives a
comprehensive value for the data in the sense that it includes the
limits within which all of the items occurred. The range can be
interpreted as an intensive measure of variability except in very
small samples.
Advantages of QD
1. The QD is based on the middle 50% of a distribution and is
complementary to the median.
2. The QD is easy to compute and easy to understand.
3. QD is not affected by the extreme values.
4. QD is superior to the range as a rough measure of dispersion.
Limitations of QD
1. Quartile deviation ignores 50% items, i.e., the first 25% and
the last 25%.
2. The value of QD does not depend upon every observation it
cannot be regarded as a good method of measuring variation.
3. QD is not capable of mathematical manipulation.
5
4. Its value is very much affected by sampling fluctuations.
5. It is in fact not a measure of dispersion as it really does not
show the scatter around as average but rather a distance on a
scale,.i
Example 3. The values of Q1, Q2 and Q3 are worked out for a
company are Q1=174.90, Q2 = 190.23 and Q3 =203.83. Find the
quartile deviation and Co-efficient of quartile deviation
.
Mean deviation:
Mean deviation or Average deviation is obtained by calculating the
absolute deviations of each observation from median (or mean), and
then averaging these deviations by taking their arithmetic mean.
Definition:
Mean deviation is the average of the absolute deviations taken from
a central value, generally the mean or median. It is denoted by M.D
MD for raw or ungrouped data:
Let x1, x2, ........,xn are n observations and their arithmetic mean 𝑥,
then the mean deviation is MD is defined by;
𝑛
∑ |𝑋𝑖−𝑋|
M.D (𝑥) = 𝑖=1
𝑛
, when taken from the mean
𝑛
∑ |𝑋𝑖−𝑀𝑒|
M.D (Me) = 𝑖=1
𝑛
, when taken from the median.
Computation of Mean Deviation (ungrouped data):
In the deviation method of measuring scatter, the following steps
are to be taken:
1. Compute the mean or median of the observations.i.e.,
6
𝑋 or Me.
2. Find the deviations of each observation from the
7
1. Median or mean of the series is calculated .i.e., 𝑋 or Me.
2. Deviations of the items from median or mean are ascertained
ignoring plus and minus signs.
3. Deviations computed above are multiplied by the respective
𝑛 𝑛
frequencies. i.e., ∑ 𝑓𝑖|𝑥𝑖 − 𝑥|. or ∑ 𝑓𝑖|𝑥𝑖 − 𝑀𝑒|.
𝑖=1 𝑖=1
𝑛 𝑛
4. ∑ 𝑓𝑖|𝑥𝑖 − 𝑥|. or ∑ 𝑓𝑖|𝑥𝑖 − 𝑀𝑒|. is divided by the number of
𝑖=1 𝑖=1
items.
The quotient obtained shall be the value of mean deviation.
Properties:
1. Mean deviation removes one main objection of the earlier
measures, that it involves each value of the set.
2. It is not affected much by extreme values.
3. It has no relationship with any of the other measures of
dispersion.
4. Its main drawback is that algebraic negative signs of the
deviations are ignored which mathematically unsound.
5. Mean deviation is minimum when the derivations are taken
from median.
6. Mean deviation is independent in origin but depend on scale.
Interpretation of the mean deviation:
1. The mean deviation may help us to find the percentage of
observations falling in a range of Average ±mean deviation.
8
This concentration would mean compactness of the
distribution.
Application of the MD:
1. The application of the MD is overshadowed to a large extent
by the use of the standard deviation (SD).But the
computation of the MD is less difficult.
2. For purpose of interpreting the significance of a series of
ratios of an item it is the most valuable. Because of its
simplicity in meaning and computation, it is especially
effective in reports presented to the general public or to
groups not familiar with statistical methods.
Advantage of mean deviation:
1. The outstanding advantage of the MD is its relative simplicity.
2. It is simple to understand and easy to compute.
3. It is based on each and every observation of the data.
Consequently change in the value of any observation would
change the value of average deviation.
4. Mean deviation is less affected by the values of extreme
observation.
5.Deviation are taken from a central value, comparison about
formation of different distributions can easily ve made.
Limitations of mean deviation:
1. The greatest drawback of this method is that algebraic signs
are ignored while taking the deviations of the items.
2. This method may not give us accurate results. The reason is
that mean deviation gives us best results when deviations are
taken from median. But median is not a satisfactory measure
when the degree of variability in a series is very high.
3. It is not capable of further algebraic treatment.
4. It is rarely used in sociological and business studies.
9
𝑀𝐷
Coefficient of mean deviation, C.MD = × 100
𝑋
Example 4.
Calculate mean deviation and coefficient of mean deviation taken the
from mean from the following data:
Sales(in 15-19 19-23 23-27 27-31 31-35 35-39
thousand $)
No. of days 8 59 47 23 6 4
Problem 5.
Calculate mean deviation from mean from the following data:
Sales(in 10 - 20 20 - 30 30 - 40 40 - 50 50 - 60
thousand $)
No. of days 3 6 11 3 2
Also calculate the co-efficient of mean deviation.
Solution: We want to calculate Mean deviation from the mean.
Calculation table of mean deviation
Sales (in Mid- frequency fi xi |𝑥𝑖 − 𝑥| fi|𝑥𝑖 − 𝑥|
thousand value( (fi)
$) xi)
10 -20 15 3 45 18 36
20 -30 25 6 150 08 48
30 - 40 35 11 385 02 22
40 - 50 45 3 135 12 36
50 - 60 55 2 110 22 44
Total N = 25 825 186
𝑛
∑ 𝑓𝑖|𝑋𝑖−𝑋|
.D (𝑥) = 𝑖=1 𝑁
, when mean deviation from mean.
Where, fi = class frequency, Xi = Mid-value; 𝑋= Mean of X and
N = Total no. of observations.
825
Where, 𝑋= 25 = 33
Mean deviation about mean by the formula is
∑𝑓𝑖|𝑥𝑖−𝑥|
186
M.D (𝑥) = 𝑁
= 25
= $7. 44
11
Coefficient of mean deviation;
𝑀.𝐷 7.44
C.M.D = ×100 = 33 × 100 = 0. 2255 × 100 = 22. 55
𝑋
Thus the mean sales are $33 thousand per day and the mean
deviation of sales is $ 22.55 % thousand.
VARIANCE:
Definition: The variance is the average of the squares of the
deviations taken from mean i.e., sum of the square deviation and
dividing by the number of observations is called variance.
( )
𝑛 𝑛
1 ⎰ 1 ⎱
= ∑ 𝑥𝑖2 − ∑ 𝑥𝑖 2 for i = 1, 2, ........., n.
𝑛−1 ⎱𝑖=1 𝑛
𝑖=1
⎰
𝑛
1
where 𝑥 = 𝑛
∑ 𝑥𝑖.
𝑖=1
12
𝑛
1
σ2 = 𝑁
∑ fi (Xi -𝑋)2 for i = 1, 2, .......n. Where, fi is the class
𝑖=1
frequency and xi be the mid-value for every class, N is the total
𝑛
𝑛
frequency, 𝑋 = ∑ 𝑓𝑖𝑋𝑖 and N = ∑ 𝑓𝑖
𝑖=1 𝑖=1
( )
𝑛 𝑛
1 ⎰ 1 ⎱
σ2 = ∑ 𝑓𝑖𝑋𝑖2 − ∑ 𝑓𝑖𝑋𝑖 2
𝑁 ⎱𝑖=1 𝑁
𝑖=1
⎰
If the observation xi occurs fi times for i = 1, 2, ........., n, then the
sample variance,
𝑛
2 1
S = 𝑛−1
∑ fi (xi - 𝑥)2
𝑖=1
⎰𝑛
( )
𝑛
1 1 ⎱
= ∑ 𝑓𝑖𝑥𝑖2 − ∑ 𝑓𝑖𝑥𝑖 2 for i = 1, 2, .........,
𝑛−1 ⎱𝑖=1 𝑛
𝑖=1
⎰
n.
𝑛 𝑛
1
where 𝑥 = 𝑛
∑ 𝑓𝑖𝑥𝑖. and n = ∑ 𝑓𝑖.
𝑖=1 𝑖=1
Properties of variance:
1. The variance has mostly removed the lacunae which are
present in the measures of dispersion given before it.
2. The main disadvantage of variance is, that its unit is square
of the unit of measurement of variate values. For clarity, say,
the variable X is measured in ms, the unit of variance is m2.
Generally this value is large and makes it difficult to decide
about the magnitude of variation.
3. The variance gives more weightage to the extreme values as
compared to those which are near to mean value, because the
difference is squared in variance.
13
Proof: Let us consider x1, x2, .............,xn are n observations and their
mean is denoted by 𝑥. The variance is denoted by Sx2 and defined by
𝑛 2
Sx2 =
1
𝑛 ( )
∑ 𝑥𝑖 − 𝑥 ..............................(i)
𝑖=1
𝑥𝑖−𝑎
Let us consider ui = 𝑐
be a new variate, where a is origin and c
𝑥𝑖−𝑎
is scale. We have ui = 𝑐
⇒ 𝑥𝑖 − 𝑎 = 𝑐𝑢𝑖⇒ 𝑥𝑖 = 𝑎 + 𝑐𝑢𝑖∵𝑥 = 𝑎 + 𝑐𝑢
Putting the value in (i), we have
𝑛 2
S =
x
2 1
𝑛
∑ 𝑥𝑖 − 𝑥
𝑖=1
( )
𝑛 2
=
1
𝑛 (
∑ 𝑎 + 𝑐𝑢𝑖 − 𝑎 − 𝑐𝑢
𝑖=1
)
𝑛 2 𝑛 2
=
1
𝑛 (
∑ 𝑐𝑢𝑖 − 𝑐𝑢 =
𝑖=1
) 2 1
𝑐. 𝑛 ∑ 𝑢𝑖 − 𝑢
𝑖=1
( ) 2
= 𝑐 . 𝑆𝑢
2
=
𝑥1 +𝑥2 +..........+𝑥𝑛
𝑛
− ( 𝑛 )
1+2+..........+𝑛 2
2 2 2
1 +2 +........+𝑛
=
𝑛
− ( 𝑛 )
𝑛(𝑛+1) 2
2
=
𝑛(𝑛+1)(2𝑛+1)
6𝑛
− { 2𝑛
= }
(𝑛+1)(2𝑛+1)
6
−
(𝑛+1)
4
=
(𝑛+1)
2 { 2𝑛+1
3
−
𝑛+1
2 }
14
2
(𝑛+1) (𝑛−1) 𝑛 −1
= 2
× 6
= 12
So, the variance of the first n natural numbers.
15
STANDARD DEVIATION:
There is another method of summing up deviations from measure of
central tendency and finding a measure of dispersion of the data. It is
standard deviation. Standard deviation considered superior to other
measures of dispersion because of its advantages in mathematically
representing the variability, which is very important for interpreting
statistically data.
Properties
1. Standard deviation is considered to be the best measure of
dispersion and is used widely.
2. There is however one difficulty with it. If the unit of
measurement of variances of two series is not the same, then
their variability can not be compared by comparing the
values of standard deviation.
3. Standard deviation must be a positive quantity.
4. It is based on all the observations and is readily understood.
5. It is more difficult to compute than the other measures of
dispersion but is easy to use mathematically.
16
6. It has relatively a small sampling error.
7. Its main advantage is that it is amenable to algebraic
treatment and comparatively stable under sampling
fluctuations.
8. Standard deviation is independent of change of origin but not
scale.
9. The variance is the minimum of all mean squared deviations
(MSD) and the SD is the minimum of all root mean squared
deviation (RMSD).
10. If 𝑋 and S denote the mean and standard deviation,
respectively of n non-negative quantities x1, x2,........,xn then,
𝑛 − 1≥ 𝑆
11. For a symmetrical distribution, the following area
relationships hold good.
Mean ± 1σ covers 68.27% observations.
Mean ± 2σ covers 95.45% observations.
Mean ± 3σ covers 99.73% observations.
Advantage of Standard deviation
1. The standard deviation is the best measure of the dispersion.
2. It is possible to calculate the combine standard deviation of
two or more groups.
3. For comparing the variability of two or more distributions
coefficient of variation is considered to be most appropriate
and this measure is based on mean and standard deviation.
4. SD is most prominently used in further statistical work.
𝑛 2
SD =
1
𝑛 ( )
∑ 𝑥𝑖 − 𝑥 ..............................(i)
𝑖=1
𝑥𝑖−𝑎
Let us consider ui = 𝑐
be a new variate, where a is origin and c
𝑥𝑖−𝑎
is scale. We have ui = 𝑐
⇒ 𝑥𝑖 − 𝑎 = 𝑐𝑢𝑖⇒ 𝑥𝑖 = 𝑎 + 𝑐𝑢𝑖∵𝑥 = 𝑎 + 𝑐𝑢
Putting the value in (i), we have
𝑛 2
SD =
1
𝑛
𝑖=1
(
∑ 𝑎 + 𝑐𝑢𝑖 − 𝑎 − 𝑐𝑢 )
18
𝑛 2
=
1
𝑛
∑ 𝑐𝑢𝑖 − 𝑐𝑢
𝑖=1
( )
𝑛 2 𝑛 2
=
1
𝑛
2
𝑐 ∑ 𝑢𝑖 − 𝑢
𝑖=1
( ) = 𝑐
1
𝑛
𝑖=1
∑ 𝑢𝑖 − 𝑢( ) = 𝑐. 𝑆𝐷(𝑥)
𝑛 𝑛 2
2
∑ 𝑥𝑖 ∑ 𝑥𝑖
S= 𝑖=1
𝑛
− ⎛ 𝑖=1𝑛 ⎞
⎝ ⎠ 2 2
( ) ⇒ 𝑛𝑆 ( )
𝑛 𝑛 𝑛
2
∑ 𝑥𝑖 ∑ 𝑥𝑖 𝑛 ∑ 𝑥𝑖
2 𝑖=1 𝑖=1 2 2 𝑖𝑛=1
⇒𝑆 = 𝑛
− 2 = ∑ 𝑥 𝑖− 𝑛
𝑛 𝑖=1
19
E 1326
F 1340
G 1325
H 1321
I 1320
J 1331
𝑛
2 1
Solution: We know that the variance, σ = 𝑛
∑ (𝑥𝑖 − 𝑥)2 and
𝑖=1
Standard deviation, S.D. = 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒= σ2=σ.
Calculations of Variance and Standard deviation by using the above
information;
Workers Weekly
wages (xi) (xi-𝑥) (xi-𝑥)2
A 1320 -3 9
B 1310 -13 169
C 1315 -8 64
D 1322 -1 1
E 1326 +3 9
F 1340 +17 289
G 1325 +2 4
H 1321 -2 4
I 1320 -3 9
J 1331 +8 64
n =10
∑ 𝑥𝑖=13230 (xi-𝑥) (xi-𝑥)2 = 622
We have,
∑𝑥𝑖
13230
𝑋= 𝑛
= 10
= 1323 taka.
𝑛
2 1
Variance, σ = 𝑛
∑ (𝑥𝑖 − 𝑥)2 from table we get,
𝑖=1
20
622
= 10
= 62.20 taka.
Hence, the variance for our given information is 62.20 taka. So, the
21
41-45 43 15 645 6.04 36.48 547.20
46-50 48 12 576 11.04 121.88 1462.56
51-55 53 3 159 16.04 257.28 771.84
Total N= 4,43 5444.60
120 5
We have,
∑𝑓𝑖𝑥𝑖
4435
𝑋= 𝑁
= 120
= 36. 96.
𝑛
1
variance, σ2 = 𝑁
∑ 𝑓𝑖(𝑥𝑖 − 𝑥)2 from table we get,
𝑖=1
5444.60
= 120
= 45.37.
Hence, the variance for our given information is 45.37. So, the
22
Coefficient of variation is of great practical significance and is the
best measures of comparing the variability of two series or two
groups. It shows whether the items included in a series are consistent
or not as well be evident from the examples solved hereafter. The
series or groups for which the coefficient of variation is greater is
said to be more variable or less consistent. On the other hand, the
series for which the coefficient of variation is less is said to be less
variables or more consistent.
[This measure was given by Professor Karl Pearson.] By using the
above Example 6 information, we can calculate the coefficient of
variation. We have,
σ
C.V = × 100; Here, σ = 6.74 and 𝑋=36.96, then
𝑋
6.74
= 36.96 × 100
=18.24 (Answer).
Example: A sample of six items is taken from the production of a
company. Length and weight of the six items are given below:
Length (inches): 3 8 10 12 14 15
Weight (ounces): 9 11 12 14 16 17
Find out co-efficient of variation for length and weight.
σ
Solution: We know that C.V = × 100,
𝑋
𝑛
𝑛 ∑ 𝑥𝑖
1
Where σ2 = 𝑛 ∑ (𝑥𝑖 − 𝑥)2 𝑋 = 𝑖=1
𝑛
, n = no. of observations and σ2
𝑖=1
23
Length (xi) Weight (yi)
(inches) (xi - 𝑥) (xi - 𝑥)2 ounces (yi - 𝑦) (yi - 𝑦)2
3 -7.33 53.729 9 -4.170 17.389
For length:
𝑛
∑ 𝑥𝑖
So, 𝑋 = 𝑖=1
𝑛
, n = no. of observations = 6
62
= 6
= 10.33
𝑛
2 1
The variance of length is σ x = 𝑛
∑ (𝑥𝑖 − 𝑥)2, from the
𝑖=1
2 1
calculation table we can get, σ = x 6
× 97. 334 = 16. 222 and the
standard deviation, σx = 16. 222= 4.028
Therefore the coefficient of variation for length is
σ𝑥 4.028
C.V = × 100 = 10.33 × 100= 38.99
𝑥
24
For weight:
𝑛
∑ 𝑦𝑖
So, 𝑦= 𝑖=1
𝑛
, n = no. of observations = 6
79
= 6 = 13.17
𝑛
1
The variance of length is σy2 = 𝑛 ∑ (𝑦𝑖 − 𝑦)2, from the
𝑖=1
1
calculation table we can get, σy2 = 6
× 46. 834 = 7. 806 and the
Shares - X Shares-Y
25
xi (xi -𝑥) (xi -𝑥)2 yi yi - 𝑦 (yi - 𝑦)2
318 -3.38 11.424 2545 -1.5 2.25
322 0.62 0.384 2524 -22.5 506.25
325 3.62 13.104 2544 -2.5 6.25
320 -1.38 1.904 2533 -13.5 182.25
324 2.62 6.864 2565 18.5 342.25
315 -6.38 40.704 2535 -11.5 132.25
318 -3.38 11.424 2567 20.5 420.25
329 7.62 58.064 2559 12.5 156.25
8 8 8 8
2
∑ 𝑥𝑖 = 257 ∑ (xi - 𝑥) ∑ 𝑦𝑖 = ∑ (yi - 𝑦)2
𝑖=1 𝑖=1 𝑖=1 𝑖=1
= 143.872 =1748.25
For Shares Y:
𝑛
∑ 𝑦𝑖
Mean of Y; 𝑌= 𝑛
𝑖=1
, n = no. of observations = 8
20372
= 8
= 2546. 5
26
𝑛
1 1748.25
Variance of X; σy2 = 𝑛 ∑ (𝑦𝑖 − 𝑦)2 = 8
= 218. 531
𝑖=1
Standard deviation, σy = 218. 531 = 14.783
So, the coefficients of variation for shares Y,
σ𝑦
C.V(X) = × 100
𝑌
14.783
= 2546.5
× 100 = 0.581
Comment: Since it is evident that prices of shares for company Y
are much larger than those for company X, the two standard
deviations are not directly comparable. As a share X has more
coefficient of variation, so, it shows greater variability or shares Y is
more stable. Thus the coefficient of variation can be employed for
comparing the relative consistency of the prices of shares of two or
more companies. This will help a genuine investor in selecting share
the price of which is relatively stable. Thus shares which are more
consistent in the fluctuation of prices will be preferred by him.
𝑋 = 𝑁1𝑋1+𝑁2𝑋2 ;
𝑁1+𝑁2
27
2 2 2 2
𝑁1{σ1+(𝑋1−𝑋) }+𝑁2{σ1+(𝑋2−𝑋) }
2
σ =
𝑁1+𝑁2
62×{524.41+(108.2−107.02)2}+45×{355.32+(105.4−107.02)2}
= 62+45
62×(524.41+1.3924)+(355.32+2.6244)×45
= 107
48,707.24
= 107
= 455. 21
So, the Combined variance is 455.21 (Answer).
Again the combined standard deviation, σ =
𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑖 𝑎𝑛𝑐𝑒= 455. 21=21.34 (Answer).
28
Example: A sample of 45 values has mean 85 and standard
deviation 8. A second sample of 65 values from the same population
has mean 80 and standard deviation 13. Find the mean and standard
deviation of the combined sample of 110 values.
𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
Where, 𝑋1 = 85 and 𝑋2= 80
N1 = 45 N2 = 65
σ1 = 8 σ2 = 13
Then the combined mean
𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
(85×45)+(80×65)
= 45+65
3825+5200 9025
= 110
= 110 = 82. 045
Combined standard deviation
2 2
𝑁1{σ12+(𝑋1−𝑋) }+𝑁2{σ22+(𝑋2−𝑋) }
σ2= 𝑁1+𝑁2
1
= 45+65
[45{82 +(85-82.045)2} + 65{132 +(80-82.045)2]
(45×72.732)+(65×173.182)
= 110
3272.940+11256.830 14529.770
= 110
= 110
= 132. 089
Example: The first of two samples has 100 items with mean 15 and
standard deviation 3. If the whole group has 250 items with mean
15.6 and standard deviation 3.666, find the standard deviation of the
second group.
Solution: Let us consider 𝑋1 and 𝑋2 be two mean of the first and
second samples respectively,
𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
Where, 𝑋1 = 15 and 𝑋2= ?
N1 = 100 N2 = 150
σ1 = 3 σ2 =?
σ = Combined standard deviation = 3.666
30
The combined arithmetic mean is given by the following formula:
𝑋= 𝑁1𝑋1+𝑁2𝑋2
𝑁1+𝑁2
(100×15)+(150×𝑋2)
⇒ 15. 6 = 100+150
⇒3900 = 1500 + 150 𝑋2
⇒150𝑋2 = 3900 - 1500
2400
𝑋2 = 150
= 16
𝑋2 = 16.
Therefore arithmetic mean of the second sample is 16.
The combined standard deviation of the whole group is given by the
following formula:
2 2 2 2
𝑁1{σ 1+(𝑋1−𝑋) }+𝑁2{σ 2+(𝑋2−𝑋) } 1
σ= 𝑁1+𝑁2
]2
2
(100×9)+100(15−15.6)2+15σ 2+150(16−15.6)2
⇒ 3.666 = 250
31