Measures of Dispersion

Download as pdf or txt
Download as pdf or txt
You are on page 1of 24

11-Oct-22

MEASURES OF DISPERSION

What Is Dispersion?

▪ The meaning of dispersion is “scatteredness.”


▪ The degree to which numerical data tends to spread around an average value is
called variation or dispersion of data.

Figure 4.1 : Three distributions A, B, and C with


same mean and different dispersion

1
11-Oct-22

Measures of Dispersion

▪ There are two types of measures of dispersion:

1. Absolute measures of dispersion: Absolute measures of dispersion are


presented in the same unit as the unit of distribution.
2. Relative measures of dispersion: Relative measures of dispersion are
useful in comparing two sets of data which have different units of
measurement.

▪ Relative measures of dispersion are pure unitless numbers and are


generally called coefficient of dispersion.

Methods of Measuring Dispersion

The following are some of the important and widely used methods of
measuring dispersion:

▪ Range
▪ Interquartile range and quartile deviation
▪ Mean deviation or average deviation
▪ Standard deviation

2
11-Oct-22

Range

▪ Range is defined as the difference between the smallest and the


greatest values in a distribution.
▪ Symbolically,
where L is the largest observation, S the smallest observation, and R the range.

Range is an absolute measure of dispersion. The


relative measure of dispersion for range is called the
coefficient of range and is calculated by the
following formula:

Range for Individual Series

Example 4.1: Find out the range and its coefficient from the following series.
110, 117, 129, 300, 357, 100, 500, 630, 750

Range can be computed by subtracting the lowest value of the series from
the highest value of the series as shown below:

3
11-Oct-22

Interquartile Range and Quartile Deviation

▪ Interquartile range is the difference between the third quartile and the first
quartile.
▪ Quartile deviation or semi-interquartile range can be obtained by dividing the
interquartile range by 2.
▪ Quartile deviation is an absolute measure of dispersion. Relative measure is
called the coefficient of quartile deviation. Coefficient of quartile deviation
can be used to measure the degree of variation in two different distributions
when both have different units of measurement.

Figure 4.5: Range, first quartile, median, third quartile, and interquartile range

4
11-Oct-22

Mean Absolute Deviation

• An additional measure of dispersion is the mean absolute deviation


(MAD). This statistic reveals the average distance from the center.
Absolute values must be used; otherwise the deviations around the
mean would sum to zero.
• The MAD is appealing because of its simple, concrete interpretation.
Using the lever analogy, the MAD tells us what the average distance is
from an individual data point.

Variance and Standard Deviation

• The population variance (denoted 2, where  is the


lowercase Greek letter “sigma”) is defined as the sum
of squared deviations from the mean divided by the
population size:

10

5
11-Oct-22

Variance and Standard Deviation

• The sample variance (denoted by s2) is defined


as the sum of squared deviations from the
sample mean divided by the (n – 1), where n is
the sample size:

11

Variance and Standard Deviation

• The standard deviation is defined as the square


root of the variance. The units of measurement
for the standard deviation is same as the units of
the variable.

Population Standard Sample Standard


Deviation Deviation

12

6
11-Oct-22

Standard Deviation

▪ Standard deviation is the square root


of the sum of square deviations of
various values from their arithmetic
mean divided by the sample size minus
one.

▪ Variance is the square of


standard deviation. Sample
variance is the sum of
squared deviations of various
values from their arithmetic
mean divided by the sample
size minus one.

13

Coefficient of Variation

• To compare dispersion in data sets with dissimilar units of


measurement (e.g. kilograms and ounces) or dissimilar means
(e.g. home prices in two different cities), we define the
coefficient of variation (CV), which is a unit-free measure of
dispersion:
𝑠
CV = ҧ × 100%
𝑥

• The CV is the standard deviation expressed as a percent of the


mean.
• In some data sets, the standard deviation can actually exceed
the mean, so the CV can exceed 100%.
• This can happen in skewed data sets, especially if there are
outliers.
• The CV is useful for comparing variables measured in different
units.

14

7
11-Oct-22

Coefficient of Variation

▪ To compare the dispersion of two distributions, the relative measure of


standard deviation is used and is referred to as the coefficient of variation.
▪ A distribution with lesser CV shows greater consistency, homogeneity, and
uniformity, whereas a distribution with greater CV is considered more
variable than others.

15

Standard Deviation and Variance for an Individual Series

Example 4.7: Find the standard deviation and variance for the
data given in Example 4.1.

16

8
11-Oct-22

Example 4.7 (Contd.)

17

Combined Standard Deviation

18

9
11-Oct-22

Example
• First Set
• n1=100
• Sd=5
• Mean=50
• Second Set
• n2=150
• Sd=6
• Mean=40

19

Empirical Rule

Figure 4.11: Area under the normal curve

20

10
11-Oct-22

Empirical Relationship Between Measures Of


Dispersion

21

Chebyshev’s Theorem

• For any population with mean 𝜇 and standard deviation 𝜎, the


percentage of observations that lie within k standard deviations of the
mean must be at least 100[1 – 1/k2].

• For k = 2 standard deviations, 100[1 – 1/22] = 75%.


• So, at least 75.0% will lie within m + 2.
• For k = 3 standard deviations, 100[1 – 1/32] = 88.9%.
• So, at least 88.9% will lie within m + 3.
• Although applicable to any data set, these limits tend to be rather
wide.

22

11
11-Oct-22

The Empirical Rule

• The normal distribution is symmetric and is also known as the bell-


shaped curve.
• The Empirical Rule states that for data from a normal distribution, we
expect the interval m ± k to contain a known percentage of data. For:
• k = 1, 68.26% will lie within m + 1.
• k = 2, 95.44% will lie within m + 2.
• k = 3, 99.73% will lie within m + 3.

23

The Empirical Rule

Note: No upper bound is given. Data values


outside m + 3 are rare.

24

12
11-Oct-22

Measures of Shape

Measures of shape are the tools used for describing the shape of a
distribution of the data. There are two measures of shape: skewness and
kurtosis.

Figure 4.12 : (a) Left skewed distribution, (b) right skewed


distribution, and (c) symmetrical distribution

A distribution of data where the right half is the mirror


image of the left half is said to be symmetrical. If the
distribution is not symmetrical, it is said to be
asymmetrical or skewed.

25

Shape

26

13
11-Oct-22

Symptoms of Skewness

27

Example of a Right Skewed Distribution

28

14
11-Oct-22

Example of a Left Skewed Distribution

29

Coefficient of Skewness
▪ Karl Pearson developed a method for measuring skewness, referred
to as the Pearsonian coefficient of skewness. This coefficient compares mean and mode
and is divided by standard deviation. Pearsonian coefficient of skewness is given as:

▪ For a positively skewed distribution, the coefficient of skewness will


have a plus sign and for a distribution that is negatively skewed, the
coefficient of skewness will have a minus sign. The actual degree of
skewness can be obtained from the numerical value of the coefficient
of skewness.

30

15
11-Oct-22

Kurtosis

• Kurtosis refers to the relative length of the tails and the degree of
concentration in the center.
• A normal bell-shaped population is called mesokurtic and serves as a
benchmark.
• A population that is flatter than a normal population (i.e., has heavier tails)
is called platykurtic, while one that is more sharply peaked than a normal
population (i.e., has thinner tails) is leptokurtic.
• Kurtosis is not the same thing as variability, although the two are easily
confused.

31

Kurtosis

32

16
11-Oct-22

Kurtosis

• A histogram is an unreliable guide to kurtosis


because its scale and axis proportions may vary, so
a numerical statistic is needed:

33

The Five-Number Summary

▪In the five-number summary, five numbers—the smallest


value, the first quartile, the median, the third quartile, and the
largest value are used to summarize data.

▪In the case of a symmetrical distribution, the relationship


between the various measures of the five-number summary is
expressed as:
▪ the distance from the smallest value to the median and the
distance from the median to the largest value remains equal;
▪ the distance from the smallest value to the first quartile and
the distance from the third quartile to the largest value
remains equal.

34

17
11-Oct-22

In the case of an asymmetrical distribution, the relationship


between the various measures of five number summary is
expressed as follows:
▪In a right-skewed distribution, the distance from the median to the largest
value is greater than the distance from the smallest value to the median.

▪In a right-skewed distribution, the distance from the third quartile to the
largest value is greater than the distance from the smallest value to the first
quartile.

▪In a left-skewed distribution, the distance from the median to the largest value
is less than the distance from the smallest value to the median.

▪In a left-skewed distribution, the distance from the third quartile to the largest
value is less than the distance from the smallest value to the first quartile.

35

Box-and-whisker plots: Box-and-whisker plot is a graphical representation of


the data based on five-number summary.

Figure 4.15 : Elements of box-and-whisker plot

36

18
11-Oct-22

Measures of Association
▪ Measures of association are statistics for measuring the strength of
relationship between two variables.
▪ Correlation measures the degree of association between two variables.
▪ Karl Pearson’s coefficient of correlation is a quantitative measure of the
degree of relationship between two variables. Suppose these variables
are x and y, then Karl Pearson’s coefficient of correlation is defined as

▪ The coefficient of correlation lies in between +1 and –1.

37

4.6 Covariance and Correlation

Covariance
The covariance of two random variables X and Y (denoted σXY )
measures the degree to which the values of X and Y change
together.

38

19
11-Oct-22

Covariance

• The units of measurement for the covariance are


unpredictable because the magnitude and/or units of
measurement of X and Y may differ. For this reason, analysts
generally work with the correlation coefficient, which is a
standardized value of the covariance that ensures a range
between −1 and +1.

39

Correlation

• Conceptually, a correlation coefficient is the covariance


divided by the product of the standard deviations
(denoted σX and σY for a population or sX and sY for a
sample). For a population, the correlation coefficient is
indicated by the lowercase Greek letter ρ (rho), while for a
sample we use the lowercase Roman letter r.

40

20
11-Oct-22

Sample Correlation Coefficient

• The sample correlation coefficient is a statistic that


describes the degree of linearity between paired
observations on two quantitative variables X and Y.

Note: -1 ≤ r ≤ +1.

41

Correlation Coefficient

42

21
11-Oct-22

Figure 4.19 : Interpretation of correlation coefficient

43

Example 4.9: Table 4.7 shows the sales revenue and advertisement expenses of a company
for the past 10 months. Find the coefficient of correlation between sales and
advertisement.

44

22
11-Oct-22

Table 4.8 : Calculation of correlation coefficient between sales and advertisement

45

Figure 4.27 : Five examples of correlation coefficient

46

23
11-Oct-22

Using MS Excel, Minitab and SPSS for the Computation of range, interquartile range,
standard deviation, variance, and coefficient of variation (An explanation through
Example 4.16)

Table 4.17 shows the sales (in million rupees) of four leading cement companies: Ambuja,
L&T, Madras Cement, and ACC from 1994–1995 to 2006–2007. Find range, interquartile
range, standard deviation, variance, and coefficient of variation from the sales data of
different companies.

47

24

You might also like