NE 2207 Part 4
NE 2207 Part 4
NE 2207 Part 4
Suppose, our sample consists of 𝑛𝑛 values of a variable 𝑋𝑋. We use ∑ 𝑥𝑥 to denote the
sum of all these values.
There are three common measures to describe the central tendency of the sample
data. These are:
1. Mean
2. Median
3. Mode
Mean
Let a sample of 𝑛𝑛 values of variable 𝑋𝑋 be taken. Data: 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 . The sample
mean is defined as
𝑛𝑛
1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Example
Data: 4, 8, 5, 9, 15
1 1
𝑥𝑥̅ = � 𝑥𝑥 = (4 + 8 + 5 + 9 + 15) = 8.2
𝑛𝑛 5
Median
It is the middlemost value in the sorted data. If 𝑛𝑛 is an odd number, median is the
middle value of the sorted data. If 𝑛𝑛 is an even number, median is the average of
the two middle values.
13
When sample size is large, approximately 50% values are less (more) than the
median.
Example
Data: 4, 8, 5, 9, 15
Sorted data: 4, 5, 8, 9, 15
Median = 8
Example:
Data: 4, 8, 5, 9, 15, 13
Mode
Example:
20 people were asked to give satisfaction rating after a restaurant meal on a scale of
1 (not satisfied) to 10 (extremely satisfied).
Data: 9, 3, 7, 5, 5, 10, 8, 9, 9, 10, 9, 8, 9, 6, 9, 8, 7, 7, 10, 6.
Mode = 9 (occurred 6 times in the data)
14
For numerical (discrete or continuous) data, any of the three measures can be used.
However, for mathematical reasons, mean or median is preferred.
Data: 2, 3, 4, 5, 7
Median represents the majority of the data. Mean represents neither the majority,
nor the outlier. Median is preferred because it gives reasonable result.
15
For negatively skewed distribution:
mean < median < mode (shown with 3 bullet points in the plot below).
Exercise
Consider the data: 2, 4, 10, 10, 12, 6, 11, 12, 12, 8. Compute mean, median and
mode. Comment on the shape of the distribution.
Solution
Sorted data: 2, 4, 6, 8, 10, 10, 11, 12, 12, 12.
Mean = 8.7
Median = (10 + 10)/2 = 10
Mode = 12
Since Mean < Median < Mode, the distribution is negatively skewed (or skewed to
the left).
Percentiles
When data are arranged in increasing order, the 𝑝𝑝th percentile is a value such that 𝑝𝑝
percent of the values fall at or below the value, and (100 − 𝑝𝑝) percent of the values
fall at or above the value. There are 99 percentiles that divide the total area of the
histogram in 100 equal parts.
Example
Let the 83rd percentile = 17.5. This means 83% values in the data are less than 17.5,
and (100 – 83) % = 17% values are more than 17.5.
16
Quartiles
There are three quartiles that divide the total area of the histogram in 4 equal parts.
The first quartile Q1 is the 25th percentile. The second quartile Q2 (or median) is
the 50th percentile. The third quartile Q3 is the 75th percentile.
Example
Five-number summary
We often describe a set of data by using a five-number summary. The summary
consists of (1) minimum (the smallest value) (2) the first quartile Q1 (3) the median
(4) the third quartile Q3 and (5) maximum (the largest value).
Example
The five-number summary of the previous data: 1, 7.5, 8, 9, 10.
17