Chapter 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/353378173

Chapter 2 Central Tendency and Variability Measures

Presentation · July 2021


DOI: 10.13140/RG.2.2.22991.20641

CITATIONS READS
0 3,365

1 author:

Raid Salha
Islamic University of Gaza
43 PUBLICATIONS 60 CITATIONS

SEE PROFILE

All content following this page was uploaded by Raid Salha on 22 July 2021.

The user has requested enhancement of the downloaded file.


Chapter 2

Central Tendency and Variability Measures

2.1 Central Tendency Measures


2.2 Variability Measures
2.3 Outliers

1
2.1 Central Tendency Measures

Central tendency refers to the location of a “typical” data value—the data


value around which other data values tend to cluster. Because a value is
more likely to be typical if it is in the middle of a distribution than if it is at
an extreme, the term central tendency has come to be used for this class of
descriptive statistics.

1. The Mean
The most commonly used index of central tendency is the mean, the term
used in statistics for the arithmetic average. The equation for calculating the
mean is as follows:

Example 1: Consider the following set of values


10 12 15 5 10 8 6 14

80
𝑋̅ = = 10
8

2
2. The Median

A second descriptive statistic used to indicate a central tendency is the


median. The median is the point in a data distribution that divides the
distribution into two equal halves 50% of the data values lie above the
median, and the other 50% lie below the median.

To calculate the median, the data values must first be sorted in ascending
order from the smallest to the largest value or in descending order from the
largest value to the smallest.

There are two cases:


1. If the data size N is odd, then the median is the observation whose
𝑁+1
order is .
2

2. If the data size N is even, then the median is the mean of the two
𝑁 𝑁+2
observations whose order are and
2 2

Example 2: The median of the following data

9 10 15 5 10 8 6 14 7

Note the sample size N = 9 is odd


5 6 7 8 9 10 10 14 15

Median = 9

3
Example 3: The median of the following data

9 10 15 5 10 8 6 14 7 13

Note the sample size N = 10 is even

5 6 7 8 9 10 10 13 14 15

9+10
Median = = 9.5
2

3. The Mode

The mode is the numerical value in a distribution that occurs most


frequently.

Example 4: Consider the following set of values


20 21 21 22 22 22 22 23 23 24
Mode = 22

Example 5: Consider the following set of values


21 21 21 22 22 22 23 24 25 25
There are two modes, 21 and 22.

4
Example 6: Using SPSS, find the central tendency measure for the data in
Example 2 (Heart rate data).

Statistics

heartrate
N Valid 100

Missing 0

Mean 65.21
Median 66.00
Mode 66

The relationship between 𝑴𝒆𝒂𝒏, 𝑴𝒆𝒅𝒊𝒂𝒏 𝐚𝐧𝐝 𝑴𝒐𝒅𝒆


For the unimodal distribution, the relationship between the three central
tendency measures is given by:
1. If the distribution is symmetric, then 𝑀𝑒𝑎𝑛 = 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝑀𝑜𝑑𝑒.
2. If the distribution is positively skewed, then 𝑀𝑜𝑑𝑒 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑒𝑎𝑛.
3. If the distribution is negatively skewed, then 𝑀𝑒𝑎𝑛 < 𝑀𝑒𝑑𝑖𝑎𝑛 < 𝑀𝑜𝑑𝑒.

5
2.2 Variability Measures

In addition to a distribution’s shape and central tendency, another important


characteristic is its variability. Variability refers to how spread out or
dispersed the scores about its mean. Two distributions with identical means
and similar shapes (e.g., both symmetric) could nevertheless differ
considerably in terms of variability.

Consider, for example, the two distributions in Figure 1. This figure shows
body weight data for two hypothetical samples, both of which have means of
150 pounds, but, clearly, the two samples differ markedly. In sample A,
there is great diversity: Some people weigh as little as 100 pounds, while
others weigh up to 200 pounds. In sample B, by contrast, there are few
people at either extreme: The weights cluster more tightly around the mean
of 150. We can verbally describe a sample’s variability. We can say, for
example, that sample A is higher varied than sample B, with regard to
weight.

Figure 1: Two distributions with the same mean and different variability

6
. Variability indexes
Statisticians have developed indexes that express the extent to which data
values on quantitative variables deviate from one another in a distribution.
Four indexes of which are described here.

1. The Range

The range, the simplest measure of variability, is the difference between the
highest data value and the lowest data value in the distribution.

For example, in Figure 1,


Range for sample A =200 -100 =100
Range for sample B =175 - 125 =50.

Note: In research reports, the range is often shown as the minimum and
maximum value, without the subtracted difference score.

2. Interquartile Range

The interquartile range (𝐼𝑄𝑅) is a variability index calculated on the basis


of quartiles. The lower (first) quartile (𝑄1 ) is the point below which 25% of
the data lie, while the upper (third) quartile (𝑄3 ) is the point below which
75% of the data lie. The interquartile range (𝐼𝑄𝑅) is the distance between
these two values:
𝐼𝑄𝑅 = 𝑄3 − 𝑄1

7
Note: The median is the second quartile is the point below which 50% of the
data lie.

Figure 2: The three quartiles

Example 7: Use SPSS to find for the heart rate data the following
a. The three quartiles and 𝐼𝑄𝑅 .
b. 10 % (percentiles) points.
c. 15 % , 25%, 45%, 50%, 60%, 75%, 95% (percentiles) points.

a.
Statistics
heartrate
N Valid 100
Missing 0
Percentiles 25 62.00
50 66.00
75 68.00

𝐼𝑄𝑅 = 68 − 62 = 6

8
b.
Statistics
heartrate
N Valid 100
Missing 0
Percentiles 10 59.00
20 61.00
30 63.00
40 64.00
50 66.00
60 66.60
70 68.00
80 69.00
90 71.00

c.
Statistics
heartrate
N Valid 100
Missing 0
Percentiles 15 60.00
25 62.00
45 65.00
50 66.00
60 66.60
75 68.00
95 72.00

9
3. The Variance

The most widely used index of variability is the variance (often abbreviated
as Var). The variance is based on differences between every data value and
the value of the mean. Thus, the formula for variance is given by:
∑𝑁 ̅ 2
𝑖=1(𝑋𝑖 −𝑋)
𝑉𝑎𝑟 = ,
𝑁−1
where
𝑁 is the sample size.
𝑋̅ is the sample mean.
𝑋𝑖 ′𝑠 are the data.

Example 8: The following data represents the weights of 10 people in


pounds. Compute the variance of the data.
110 120 130 140 150 150 160 170 180 190

𝑋𝑖 𝑋𝑖 − 𝑋̅ (𝑋𝑖 − 𝑋̅)2
110 - 40 1600
120 -30 900
130 -20 400
140 -10 100
150 0 0
150 0 0
160 10 100
170 20 400
180 30 900
190 40 1600
𝑁 𝑁 𝑁

∑ 𝑋𝑖 = 1500 ∑(𝑋𝑖 − 𝑋̅) = 0 ∑(𝑋𝑖 − 𝑋̅)2 = 6000


𝑖=1 𝑖=1 𝑖=1

10
∑ 𝑁
𝑋 1500
𝑋̅ = 𝑖=1 𝑖 = = 150.
𝑁 10

∑𝑁 ̅ 2 6000
𝑖=1(𝑋𝑖 − 𝑋 )
𝑉𝑎𝑟 = = = 666.67
𝑁−1 9

Note: Because the variance is not in the same measurement units as the
original data (in this example, it is in pounds squared), the variance is rarely
used as a descriptive statistic

4. The Standard Deviation

The most widely used index of variability is the standard deviation (often
abbreviated as SD). The standard deviation is the square root of the variance

𝑁
∑ (𝑋 −𝑋) ̅ 2
𝑆𝐷 = √ 𝑖=1 𝑖 = √𝑉𝑎𝑟.
𝑁−1

Example 9: The standard deviation for the data in Example 7 is given by

𝑆𝐷 = √𝑉𝑎𝑟 = √666.67 = 25.82.

Note: The standard deviation is often easier to interpret in a comparative


context. For example, looking back at Figure 1, distributions A and B both
had a mean of 150, but sample A would have an SD of about 20, while
sample B would have an SD of about 10. The SD index communicates that
sample A is higher varied than sample B.

11
Example 10: Using SPSS, find the variability measures for the data in Heart
rate data.

Statistics
heartrate
N Valid 100
Missing 0
Std. Deviation 4.495
Variance 20.208
Range 19

Example 11: Using SPSS, find the central tendency and variability
measures for the Heart rate data.

Statistics
heartrate
N Valid 100
Missing 0
Mean 65.21
Median 66.00
Mode 66
Std. Deviation 4.495
Variance 20.208
Range 19

12
2.3 Outliers

Outliers are often identified in relation to the value of a distribution’s IQR.

Types of outliers: There are two types of outliers


1. A mild outlier is a data value that lies between 1.5 and 3.0 times the
IQR below Q1 or above Q 3 .
2. An extreme outlier is a data value that is more than three times the IQR
below Q1 or above Q 3 .

Example 12: Are there any outliers in the heart rate data? If yes, find them?

From Example 7, for the heart rate data,


𝐼𝑄𝑅 = 6 , Q1 = 62 and Q 3 = 68
1.5 × 𝐼𝑄𝑅 = 1.5 × 6 = 9
3 × 𝐼𝑄𝑅 = 3 × 6 = 18
Q1 − 1.5 × 𝐼𝑄𝑅 = 62 − 9 = 53
Q1 − 3 × 𝐼𝑄𝑅 = 62 − 18 = 44
Q 3 + 1.5 × 𝐼𝑄𝑅 = 68 + 9 = 77
Q 3 + 3 × 𝐼𝑄𝑅 = 68 + 18 = 86

Figure 3: The outliers for the heart rate data

13
• A mild lower outlier would be any value between 44 and 53 and A
mild upper outlier would be any value between 77 and 86.
There is no mild outlier.

• An extreme lower outlier would be a value less than 44 and an


extreme upper outlier would be a value above 86.
There is no extreme outlier.
In our data distribution, there are no outliers.

A graph called a boxplot is a useful way to visualize percentiles and to


identify outliers. Figure 4 shows the boxplot for the original heart rate
data for 100 people. A boxplot shows a box that has the 75th
percentile as its upper edge (here, at 68) and the 25th percentile at its
lower edge (here, at 62). The horizontal line through the middle of the
box is the median (here, 66). The “whiskers” that extend from the box
show the highest and lowest values that are not outliers, in relation to
the IQR, as defined earlier. This graph confirms that there are no
outliers in the original dataset.

14
Figure 4: The boxplot graph for the heart rate data

Example 13:

To illustrate what a computer-generated boxplot shows when there are


outliers, we added six extreme values to the original dataset: 40, 45, and 50
at the lower end and 90, 95, and 100 at the upper end. Figure 5 shows the
resulting boxplot. The six data values that we added all are shown as
outliers, outside the outer limits of the whiskers. Mild outliers are shown
with circles: Case number 106 with a value of 50 and case number 105 with
a value of 45 are mild outliers, for example. Cases that are extreme outliers
are shown with asterisks. When there are outliers, the first thing to do is to

15
see if they are legitimate values, or reflect errors in data entry. If they are
true values, researchers can decide on whether it is appropriate to make
adjustments, such as trimming the mean.

Figure 5: The boxplot graph for the heart rate data after adding six extreme
values

16
Exercises

Q1. The following numbers represent the data values of 30 psychiatric


inpatients on a widely used measure of depression (the Center for
Epidemiologic Studies-Depression scale). What is the mean, the median, and
the mode for these data?

41 27 32 24 21 28 22 25 35 27
31 40 23 27 29 33 42 30 26 30
27 39 26 34 28 38 29 36 24 37

If the values of these indexes are not the same, discuss what they suggest
about the shape of the distribution.

Q2. Find the medians for the following distributions:

(a) 1 5 7 8 9
(b) 3 5 6 8 9 10
(c) 3 4 4 4 6 20
(d) 2 4 5 5 8 9

Q3. For which distribution in question Q2 would the median be preferred to


the mean as the index of central tendency? Why?

Q4. The following ten data values are systolic blood pressure readings.
Compute the mean, the range, the SD, and the variance for these data.

130 110 160 120 170 120 150 140 160 140
17
Q5. The following data represent blood pressure of 30 people

115 110 130 135 120 125 135 120 90 130

140 250 50 125 120 110 130 140 150 75

145 60 125 110 80 140 160 120 135 100

a. Calculate IQR.
b. Are there any outliers in the blood pressure data? If yes, find them and
classify them as mild or extreme outliers?
c. Graph the boxplot for the data.

18

View publication stats

You might also like