Module 3 - Measures of Dispersion and Shape

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 6

Course – Descriptive Statistical Analysis

Module 3:
Measures of Dispersion and Shape

Learning Objectives:
 Learn about different measures of dispersion with their usage and
interpretation.
 Learn about different measures of shape with their usage and interpretation.

Introduction
Understanding data distribution is an essential part of the data analysis process.
Measures of dispersion and shape are two key aspects that help to characterize data
by showing the spread, symmetry, and tailedness of data. These measures offer
valuable insights into the dataset's characteristics, facilitating informed decision-
making.

Measures of Dispersion
Measures of dispersion help to quantify the spread of data or the degree to which the
data values deviate from the measures of central tendency, i.e. mean, median, and
mode. The commonly used measures of dispersion are as listed below:

Table 1: Measures of Dispersion

# Measure of Definition Formula Interpretation &


Dispersion Usage
1 Range  Range is 𝑅𝑎𝑛𝑔𝑒 = Maximum  Helpful to
defined Value − Minimum understand how
as the Value data is
differenc distributed.
e  Vulnerable to
between outliers.
the
maximu
m and
the
minimum
values in
data.
 Measure
s the
spread
of data.
2 Inter-Quartile  Measure 𝐼𝑄𝑅 = 𝑇ℎ𝑖𝑟𝑑  IQR is not
Range (IQR) s the 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 − 𝐹𝑖𝑟𝑠𝑡 affected by
spread 𝑄𝑢𝑎𝑟𝑡𝑖𝑙𝑒 outliers;
of the however, it
middle accounts for
half of only half of the
the data.
distributi  It is often used
on of to find outliers in
data. data.
 IQR is  Outliers are
defined as defined as
the values that fall
difference
below Q1 − 1.5
between
the 75th IQR or above
and 25th Q3 + 1.5 IQR.
percentile
s of the
data.
3 Variance Measures N
 Variance is
the average ∑ ( X i−μ ) 2 based on all
squared σ 2= i =1 data values,
N
deviation of but it gives
data values more weight
from the to extreme
mean (µ). values.
 It is
It is
expressed in
represented
squared
as σ2. units.

4 Standard Measures N
 Standard
Deviation the square ∑ ( X i−μ ) 2 Deviation is
root of the σ 2= i =1 expressed in
N
average the same unit
squared as the data.
deviation of  It also gives
data values more weight
from the to extreme
values.
mean.
It is
represented
as σ.
5 Coefficient of Relative σ Coefficient of
CV = × 100
Variation (CV) measure of μ Variation is helpful
dispersion to compare the
where the degree of variation
Standard from one data
Deviation is distribution to the
expressed other, even when
as a there is a huge
percentage difference in their
of the mean. means.

Figure 1: Characteristics of Normal Distribution

Example
Consider the temperature readings recorded for a week as given below:
Table 2: Temperature Record

Day Temperature (in ºC)


Monday 10
Tuesday 11
Wednesday 9
Thursday 12
Friday 14
Saturday 16
Sunday 20

The measures of dispersion for the above data are calculated as follows:
1. Range = Highest Value – Lowest Value = 20 – 9 = 11ºC
2. IQR: First, arrange the data in ascending/descending order and then calculate
the first and third quartiles.
Q1 = 10 and Q3 = 16
IQR = 16 – 10 = 6ºC
3. Variance: First, calculate the mean, then find the square of the difference from the
mean and sum it up. Variance is then the average of this value. The variance
comes out to be 12.69.
4. Standard Deviation – It is the square root of variance.
SD = 3.56
5. CV = (3.56/13.14) *100 = 27.11%

Measures of Shape
Measures of shape are used to understand the overall structure or form of the
dataset, revealing patterns and characteristics such as symmetry, skewness, and
tailedness.
1. Skewness – Measures the symmetry of the distribution.
Figure 2: Types of Skewness

a. Symmetrical Distribution – Two sides of the distribution are mirror images of


each other, as shown in Fig 2(b).
The Skewness is zero for this distribution, and Mean = Median = Mode.
The distribution is almost symmetrical for the skewness value between -0.5
and 0.5.
b. Asymmetrical Distribution – Two sides of the distribution are not mirror
images of each other as shown in Fig. 2(a) and 2(c).

In Fig. 2(a), for this distribution, Mean < Median < Mode. If the skewness is
between -1 and -0.5, then the distribution is moderately negatively skewed,
and if the value is less than -1, then the distribution is highly negatively
skewed.

In Fig. 2(c), for this distribution, Mean > Median > Mode. If the skewness is
between 0.5 and 1, then the distribution is moderately positively skewed, and
if the value is greater than 1, then the distribution is highly positively skewed.

2. Kurtosis – Measure of tailedness of distribution representing how often outliers


occur.
Figure 3: Types of Kurtosis

a. Mesokurtic Distribution – Medium-tailed distribution with neither highly


frequent nor infrequent outliers. Normally distributed curve with Kurtosis 3 is
mesokurtic.
b. Leptokurtic Distribution – Heavy-tailed distribution with a lot of outliers.
Leptokurtosis is also called positive kurtosis and has a value of more than 3.
c. Platykurtic Distribution - Light-tailed distribution with infrequent outliers.
Platykurtosis is also called negative kurtosis, with a value of less than 3.

Key Takeaways
 This reading collateral explains in detail different measures of dispersion
along with their computation formula and significance.
 The concepts of different measures of shape are also explained with
appropriate diagrams.

You might also like