Measures of Dispersion

11-Oct-22
MEASURES OF DISPERSION
What Is Dispersion?
▪ The meaning of dispersion is “scatteredness.”

▪ The degree to which numerical data tends to spread around an average value is
called variation or dispersion of data.
Figure 4.1 : Three distributions A, B, and C with

same mean and different dispersion
1
11-Oct-22
Measures of Dispersion
▪ There are two types of measures of dispersion:
1. Absolute measures of dispersion: Absolute measures of dispersion are

presented in the same unit as the unit of distribution.
2. Relative measures of dispersion: Relative measures of dispersion are
useful in comparing two sets of data which have different units of
measurement.
▪ Relative measures of dispersion are pure unitless numbers and are

generally called coefficient of dispersion.
Methods of Measuring Dispersion
The following are some of the important and widely used methods of
measuring dispersion:
▪ Range
▪ Interquartile range and quartile deviation
▪ Mean deviation or average deviation
▪ Standard deviation
2
11-Oct-22
Range
▪ Range is defined as the difference between the smallest and the

greatest values in a distribution.
▪ Symbolically,
where L is the largest observation, S the smallest observation, and R the range.
Range is an absolute measure of dispersion. The

relative measure of dispersion for range is called the
coefficient of range and is calculated by the
following formula:
Range for Individual Series
Example 4.1: Find out the range and its coefficient from the following series.
110, 117, 129, 300, 357, 100, 500, 630, 750
Range can be computed by subtracting the lowest value of the series from
the highest value of the series as shown below:
3
11-Oct-22
Interquartile Range and Quartile Deviation
▪ Interquartile range is the difference between the third quartile and the first
quartile.
▪ Quartile deviation or semi-interquartile range can be obtained by dividing the
interquartile range by 2.
▪ Quartile deviation is an absolute measure of dispersion. Relative measure is
called the coefficient of quartile deviation. Coefficient of quartile deviation
can be used to measure the degree of variation in two different distributions
when both have different units of measurement.
Figure 4.5: Range, first quartile, median, third quartile, and interquartile range
4
11-Oct-22
Mean Absolute Deviation
• An additional measure of dispersion is the mean absolute deviation

(MAD). This statistic reveals the average distance from the center.
Absolute values must be used; otherwise the deviations around the
mean would sum to zero.
• The MAD is appealing because of its simple, concrete interpretation.
Using the lever analogy, the MAD tells us what the average distance is
from an individual data point.
Variance and Standard Deviation
• The population variance (denoted 2, where  is the

lowercase Greek letter “sigma”) is defined as the sum
of squared deviations from the mean divided by the
population size:
10
5
11-Oct-22
• The sample variance (denoted by s2) is defined

as the sum of squared deviations from the
sample mean divided by the (n – 1), where n is
the sample size:
11
• The standard deviation is defined as the square

root of the variance. The units of measurement
for the standard deviation is same as the units of
the variable.
Population Standard Sample Standard

Deviation Deviation
12
6
11-Oct-22
Standard Deviation
▪ Standard deviation is the square root

of the sum of square deviations of
various values from their arithmetic
mean divided by the sample size minus
one.
▪ Variance is the square of

standard deviation. Sample
variance is the sum of
squared deviations of various
values from their arithmetic
mean divided by the sample
size minus one.
13
Coefficient of Variation
• To compare dispersion in data sets with dissimilar units of

measurement (e.g. kilograms and ounces) or dissimilar means
(e.g. home prices in two different cities), we define the
coefficient of variation (CV), which is a unit-free measure of
dispersion:
𝑠
CV = ҧ × 100%
𝑥
• The CV is the standard deviation expressed as a percent of the

mean.
• In some data sets, the standard deviation can actually exceed
the mean, so the CV can exceed 100%.
• This can happen in skewed data sets, especially if there are
outliers.
• The CV is useful for comparing variables measured in different
units.
14
7
11-Oct-22
Coefficient of Variation
▪ To compare the dispersion of two distributions, the relative measure of

standard deviation is used and is referred to as the coefficient of variation.
▪ A distribution with lesser CV shows greater consistency, homogeneity, and
uniformity, whereas a distribution with greater CV is considered more
variable than others.
15
Standard Deviation and Variance for an Individual Series
Example 4.7: Find the standard deviation and variance for the
data given in Example 4.1.
16
8
11-Oct-22
Example 4.7 (Contd.)
17
Combined Standard Deviation
18
9
11-Oct-22
Example
• First Set
• n1=100
• Sd=5
• Mean=50
• Second Set
• n2=150
• Sd=6
• Mean=40
19
Empirical Rule
Figure 4.11: Area under the normal curve
20
10
11-Oct-22
Empirical Relationship Between Measures Of

Dispersion
21
Chebyshev’s Theorem
• For any population with mean 𝜇 and standard deviation 𝜎, the

percentage of observations that lie within k standard deviations of the
mean must be at least 100[1 – 1/k2].
• For k = 2 standard deviations, 100[1 – 1/22] = 75%.

• So, at least 75.0% will lie within m + 2.
• For k = 3 standard deviations, 100[1 – 1/32] = 88.9%.
• So, at least 88.9% will lie within m + 3.
• Although applicable to any data set, these limits tend to be rather
wide.
22
11
11-Oct-22
The Empirical Rule
• The normal distribution is symmetric and is also known as the bell-

shaped curve.
• The Empirical Rule states that for data from a normal distribution, we
expect the interval m ± k to contain a known percentage of data. For:
• k = 1, 68.26% will lie within m + 1.
23
The Empirical Rule
Note: No upper bound is given. Data values

outside m + 3 are rare.
24
12
11-Oct-22
Measures of Shape
Measures of shape are the tools used for describing the shape of a
distribution of the data. There are two measures of shape: skewness and
kurtosis.
Figure 4.12 : (a) Left skewed distribution, (b) right skewed

distribution, and (c) symmetrical distribution
A distribution of data where the right half is the mirror

image of the left half is said to be symmetrical. If the
distribution is not symmetrical, it is said to be
asymmetrical or skewed.
25
Shape
26
13
11-Oct-22
Symptoms of Skewness
27
Example of a Right Skewed Distribution
28
14
11-Oct-22
Example of a Left Skewed Distribution
29
Coefficient of Skewness
▪ Karl Pearson developed a method for measuring skewness, referred
to as the Pearsonian coefficient of skewness. This coefficient compares mean and mode
and is divided by standard deviation. Pearsonian coefficient of skewness is given as:
▪ For a positively skewed distribution, the coefficient of skewness will

have a plus sign and for a distribution that is negatively skewed, the
coefficient of skewness will have a minus sign. The actual degree of
skewness can be obtained from the numerical value of the coefficient
of skewness.
30
15
11-Oct-22
Kurtosis
• Kurtosis refers to the relative length of the tails and the degree of
concentration in the center.
• A normal bell-shaped population is called mesokurtic and serves as a
benchmark.
• A population that is flatter than a normal population (i.e., has heavier tails)
is called platykurtic, while one that is more sharply peaked than a normal
population (i.e., has thinner tails) is leptokurtic.
• Kurtosis is not the same thing as variability, although the two are easily
confused.
31
Kurtosis
32
16
11-Oct-22
Kurtosis
• A histogram is an unreliable guide to kurtosis

because its scale and axis proportions may vary, so
a numerical statistic is needed:
33
The Five-Number Summary
▪In the five-number summary, five numbers—the smallest

value, the first quartile, the median, the third quartile, and the
largest value are used to summarize data.
▪In the case of a symmetrical distribution, the relationship

between the various measures of the five-number summary is
expressed as:
▪ the distance from the smallest value to the median and the
distance from the median to the largest value remains equal;
▪ the distance from the smallest value to the first quartile and
the distance from the third quartile to the largest value
remains equal.
34
17
11-Oct-22
In the case of an asymmetrical distribution, the relationship

between the various measures of five number summary is
expressed as follows:
▪In a right-skewed distribution, the distance from the median to the largest
value is greater than the distance from the smallest value to the median.
▪In a right-skewed distribution, the distance from the third quartile to the
largest value is greater than the distance from the smallest value to the first
quartile.
▪In a left-skewed distribution, the distance from the median to the largest value
is less than the distance from the smallest value to the median.
▪In a left-skewed distribution, the distance from the third quartile to the largest
value is less than the distance from the smallest value to the first quartile.
35
Box-and-whisker plots: Box-and-whisker plot is a graphical representation of

the data based on five-number summary.
Figure 4.15 : Elements of box-and-whisker plot
36
18
11-Oct-22
Measures of Association
▪ Measures of association are statistics for measuring the strength of
relationship between two variables.
▪ Correlation measures the degree of association between two variables.
▪ Karl Pearson’s coefficient of correlation is a quantitative measure of the
degree of relationship between two variables. Suppose these variables
are x and y, then Karl Pearson’s coefficient of correlation is defined as
▪ The coefficient of correlation lies in between +1 and –1.
37
4.6 Covariance and Correlation
Covariance
The covariance of two random variables X and Y (denoted σXY )
measures the degree to which the values of X and Y change
together.
38
19
11-Oct-22
Covariance
• The units of measurement for the covariance are

unpredictable because the magnitude and/or units of
measurement of X and Y may differ. For this reason, analysts
generally work with the correlation coefficient, which is a
standardized value of the covariance that ensures a range
between −1 and +1.
39
Correlation
• Conceptually, a correlation coefficient is the covariance

divided by the product of the standard deviations
(denoted σX and σY for a population or sX and sY for a
sample). For a population, the correlation coefficient is
indicated by the lowercase Greek letter ρ (rho), while for a
sample we use the lowercase Roman letter r.
40
20
11-Oct-22
Sample Correlation Coefficient
• The sample correlation coefficient is a statistic that

describes the degree of linearity between paired
observations on two quantitative variables X and Y.
Note: -1 ≤ r ≤ +1.
41
Correlation Coefficient
42
21
11-Oct-22
Figure 4.19 : Interpretation of correlation coefficient
43
Example 4.9: Table 4.7 shows the sales revenue and advertisement expenses of a company
for the past 10 months. Find the coefficient of correlation between sales and
advertisement.
44
22
11-Oct-22
Table 4.8 : Calculation of correlation coefficient between sales and advertisement
45
Figure 4.27 : Five examples of correlation coefficient
46
23
11-Oct-22
Using MS Excel, Minitab and SPSS for the Computation of range, interquartile range,
standard deviation, variance, and coefficient of variation (An explanation through
Example 4.16)
Table 4.17 shows the sales (in million rupees) of four leading cement companies: Ambuja,
L&T, Madras Cement, and ACC from 1994–1995 to 2006–2007. Find range, interquartile
range, standard deviation, variance, and coefficient of variation from the sales data of
different companies.
47
24

Measures of Dispersion

Uploaded by

Copyright:

Available Formats

Measures of Dispersion

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Measures of Dispersion

Uploaded by

Copyright:

Available Formats

11-Oct-22

▪ The meaning of dispersion is “scatteredness.”

Figure 4.1 : Three distributions A, B, and C with

▪ There are two types of measures of dispersion:

1. Absolute measures of dispersion: Absolute measures of dispersion are

▪ Relative measures of dispersion are pure unitless numbers and are

Methods of Measuring Dispersion

▪ Range is defined as the difference between the smallest and the

Range is an absolute measure of dispersion. The

Range for Individual Series

Interquartile Range and Quartile Deviation

Mean Absolute Deviation

• An additional measure of dispersion is the mean absolute deviation

Variance and Standard Deviation

• The population variance (denoted 2, where  is the

Variance and Standard Deviation

• The sample variance (denoted by s2) is defined

Variance and Standard Deviation

• The standard deviation is defined as the square

Population Standard Sample Standard

▪ Standard deviation is the square root

▪ Variance is the square of

• To compare dispersion in data sets with dissimilar units of

• The CV is the standard deviation expressed as a percent of the

▪ To compare the dispersion of two distributions, the relative measure of

Standard Deviation and Variance for an Individual Series

Example 4.7 (Contd.)

Combined Standard Deviation

Figure 4.11: Area under the normal curve

Empirical Relationship Between Measures Of

• For any population with mean 𝜇 and standard deviation 𝜎, the

• For k = 2 standard deviations, 100[1 – 1/22] = 75%.

The Empirical Rule

• The normal distribution is symmetric and is also known as the bell-

The Empirical Rule

Note: No upper bound is given. Data values

Figure 4.12 : (a) Left skewed distribution, (b) right skewed

A distribution of data where the right half is the mirror

Example of a Right Skewed Distribution

Example of a Left Skewed Distribution

▪ For a positively skewed distribution, the coefficient of skewness will

• A histogram is an unreliable guide to kurtosis

The Five-Number Summary

▪In the five-number summary, five numbers—the smallest

▪In the case of a symmetrical distribution, the relationship

In the case of an asymmetrical distribution, the relationship

Box-and-whisker plots: Box-and-whisker plot is a graphical representation of

Figure 4.15 : Elements of box-and-whisker plot

▪ The coefficient of correlation lies in between +1 and –1.

4.6 Covariance and Correlation

• The units of measurement for the covariance are

• Conceptually, a correlation coefficient is the covariance

Sample Correlation Coefficient

• The sample correlation coefficient is a statistic that

Figure 4.19 : Interpretation of correlation coefficient

Table 4.8 : Calculation of correlation coefficient between sales and advertisement

Figure 4.27 : Five examples of correlation coefficient

You might also like