Stat 1101 4 7
Stat 1101 4 7
Stat 1101 4 7
Suppose, our sample consists of 𝑛𝑛 values of a variable 𝑋𝑋. We use ∑ 𝑥𝑥 to denote the
sum of all these values.
There are three common measures to describe the central tendency of the sample
data. These are:
1. Mean
2. Median
3. Mode
Mean
Let a sample of 𝑛𝑛 values of variable 𝑋𝑋 be taken. Data: 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 . The sample
mean is defined as
𝑛𝑛
1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Example
Data: 4, 8, 5, 9, 15
1 1
𝑥𝑥̅ = � 𝑥𝑥 = (4 + 8 + 5 + 9 + 15) = 8.2
𝑛𝑛 5
Median
It is the middlemost value in the sorted data. If 𝑛𝑛 is an odd number, median is the
middle value of the sorted data. If 𝑛𝑛 is an even number, median is the average of
the two middle values.
13
When sample size is large, approximately 50% values are less (more) than the
median.
Example
Data: 4, 8, 5, 9, 15
Sorted data: 4, 5, 8, 9, 15
Median = 8
Example:
Data: 4, 8, 5, 9, 15, 13
Mode
Example:
20 people were asked to give satisfaction rating after a restaurant meal on a scale of
1 (not satisfied) to 10 (extremely satisfied).
Data: 9, 3, 7, 5, 5, 10, 8, 9, 9, 10, 9, 8, 9, 6, 9, 8, 7, 7, 10, 6.
Mode = 9 (occurred 6 times in the data)
14
For numerical (discrete or continuous) data, any of the three measures can be used.
However, for mathematical reasons, mean or median is preferred.
Data: 2, 3, 4, 5, 7
Median represents the majority of the data. Mean represents neither the majority,
nor the outlier. Median is preferred because it gives reasonable result.
15
For negatively skewed distribution:
mean < median < mode (shown with 3 bullet points in the plot below).
Exercise
Consider the data: 2, 4, 10, 10, 12, 6, 11, 12, 12, 8. Compute mean, median and
mode. Comment on the shape of the distribution.
Solution
Sorted data: 2, 4, 6, 8, 10, 10, 11, 12, 12, 12.
Mean = 8.7
Median = (10 + 10)/2 = 10
Mode = 12
Since Mean < Median < Mode, the distribution is negatively skewed (or skewed to
the left).
Percentiles
When data are arranged in increasing order, the 𝑝𝑝th percentile is a value such that 𝑝𝑝
percent of the values fall at or below the value, and (100 − 𝑝𝑝) percent of the values
fall at or above the value. There are 99 percentiles that divide the total area of the
histogram in 100 equal parts.
Example
Let the 83rd percentile = 17.5. This means 83% values in the data are less than 17.5,
and (100 – 83) % = 17% values are more than 17.5.
16
Quartiles
There are three quartiles that divide the total area of the histogram in 4 equal parts.
The first quartile Q1 is the 25th percentile. The second quartile Q2 (or median) is
the 50th percentile. The third quartile Q3 is the 75th percentile.
Example
Five-number summary
We often describe a set of data by using a five-number summary. The summary
consists of (1) minimum (the smallest value) (2) the first quartile Q1 (3) the median
(4) the third quartile Q3 and (5) maximum (the largest value).
Example
The five-number summary of the previous data: 1, 7.5, 8, 9, 10.
17
Variation in the data
Consider the following two datasets:
1st dataset: 49, 50, 51
2nd dataset: 0, 50, 100
Both datasets have the same mean: 50, but the 2nd dataset has more variability. We
will discuss how to measure variability.
Measures
(a) Range
(b) Mean deviation from mean
(c) Variance and Standard deviation
Range
Range = largest value − smallest value.
Example
Data: 2, 4, 8, 5
Range = 8 – 2 = 6
• Range is not very useful. It only gives a rough idea about the variation.
Note
For any data set
𝑛𝑛
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) = 0
𝑖𝑖=1
18
Example
Data: 2, 3, 5, 6
𝑥𝑥̅ = 4
1
MD = (|2 − 4| + ⋯ + |6 − 4|) = 1.5
4
Variance
1
𝑠𝑠 2 = �(𝑥𝑥 − 𝑥𝑥̅ )2
𝑛𝑛 − 1
Example
Data: 2, 3, 5, 6
𝑥𝑥̅ = 4
1
𝑠𝑠 2 = �(𝑥𝑥 − 𝑥𝑥̅ )2
𝑛𝑛 − 1
1
= ((2 − 4)2 + (3 − 4)2 + (5 − 4)2 + (6 − 4)2 )
4−1
= 3.33
Note
The division is by 𝑛𝑛 − 1 because the number of free values (degrees of freedom) is
𝑛𝑛 − 1. If 𝑛𝑛 = 4, and we know 3 values of (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ), the 4th one can be calculated.
Empirical Rule
If a distribution (histogram) appears to be symmetric and bell-shaped, we expect
that approximately
• 68% of the data values will fall in the interval (𝑥𝑥̅ − 𝑠𝑠, 𝑥𝑥̅ + 𝑠𝑠)
(within one standard deviation of the sample mean)
• 95% of the data values will fall in the interval (𝑥𝑥̅ − 2𝑠𝑠, 𝑥𝑥̅ + 2𝑠𝑠)
(within two standard deviations of the sample mean)
• 99.7% of the data values will fall in the interval (𝑥𝑥̅ − 3𝑠𝑠, 𝑥𝑥̅ + 3𝑠𝑠)
(within three standard deviations of the sample mean)
Example
Let the mean and SD of commuting time (minutes) of workers be 60 and 10,
respectively. Let the histogram be more or less symmetric and bell-shaped. We
then have:
𝑥𝑥̅ − 𝑠𝑠 = 60 − 10 = 50
𝑥𝑥̅ + 𝑠𝑠 = 60 + 10 = 70
Approximately 68% workers have commuting time between 50 and 70 minutes.
𝑥𝑥̅ − 2𝑠𝑠 = 60 − 2 × 10 = 40
𝑥𝑥̅ + 2𝑠𝑠 = 60 + 2 × 10 = 80
Approximately 95% workers have commuting time between 40 and 80 minutes.
𝑥𝑥̅ − 3𝑠𝑠 = 60 − 3 × 10 = 30
𝑥𝑥̅ + 3𝑠𝑠 = 60 + 3 × 10 = 90
Approximately 97.7% workers have commuting time between 30 and 90 minutes.
20
30 40 50 60 70 80 90
21
Mean and variance from frequency table
Example
𝑥𝑥 Frequency
0 40
2 10
3 20
4 30
Total 100
Mean:
1
𝑥𝑥̅ = (0 × 40 + 2 × 10 + 3 × 20 + 4 × 30)
100
=2
That is,
𝑘𝑘
1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖 𝑓𝑓𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Variance:
1
𝑠𝑠 2 = ((0 − 2)2 × 40 + (2 − 2)2 × 10 + (3 − 2)2 × 20 + (4 − 2)2 × 30)
99
= 3.03
That is,
𝑘𝑘
1
𝑠𝑠 2 = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 𝑓𝑓𝑖𝑖
𝑛𝑛 − 1
𝑖𝑖=1
Exercise
Calculate mean deviation (from mean) from the above frequency table.
22
Solution:
Do it yourself.
Example
Class Frequency
0−5 40
5 − 10 20
10 − 15 10
15 − 20 30
Total 100
We use mid-values of each class in our calculation.
Mean:
1
𝑥𝑥̅ = (2.5 × 40 + 7.5 × 20 + 12.5 × 10 + 17.5 × 30)
100
=9
That is,
𝑘𝑘
1
𝑥𝑥̅ = � 𝑚𝑚𝑖𝑖 𝑓𝑓𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Variance:
1
𝑠𝑠 2 = ((2.5 − 9)2 × 40 + (7.5 − 9)2 × 20 + (12.5 − 9)2 × 10
99
+ (17.5 − 9)2 × 30)
= 40.66
23
That is,
𝑘𝑘
2
1
𝑠𝑠 = �(𝑚𝑚𝑖𝑖 − 𝑥𝑥̅ )2 𝑓𝑓𝑖𝑖
𝑛𝑛 − 1
𝑖𝑖=1
Exercise
Calculate mean deviation (from mean) from the above frequency table.
Solution
Do it yourself.
24
Probability
Probability is a measure of certainty (surety) of an event. It takes a value between
‘0’ and ‘1’. For an impossible event, the probability is zero. For a sure event, the
probability is one.
Example: If a regular (fair) coin is tossed, the probability that a ‘head’ occurs is 1/2
= 0.50 (50% chance).
Example: If a fair dice is tossed, the probability that a ‘3’ occurs is 1/6 = 0.167
(16.7% chance).
It should be noted here that in old English, ‘die’ is singular and ‘dice’ is plural. In
present-day English, ‘dice’ is both singular and plural.
Example: Tossing a coin, throwing a dice, counting the number of calls received in
an hour, etc. are random experiments.
Sample space:
The set of all possible outcomes of a random experiment is called the sample space.
It is usually denoted by 𝑆𝑆. It is comparable to the universal set in set theory.
Event:
Any subset of the sample space is called an event. When a random experiment is
going to be conducted, we are interested in the probabilities of different events.
Example: In dice throwing, 𝐴𝐴 = {1}, 𝐵𝐵 = {1, 4, 5}, etc. are events. Note that B
denotes the event that 1 or 4 or 5 occurs. They cannot happen together!
Events are sets. So, upper-case letters should be used for notation.
Operations on events:
Let a random experiment have 𝑛𝑛 possible outcomes that are mutually exclusive,
exhaustive and equally likely (all the outcomes have same chance). If 𝑚𝑚 of these
outcomes are favorable to an event 𝐴𝐴, then probability of 𝐴𝐴 is given by:
𝑚𝑚
𝑃𝑃(𝐴𝐴) =
𝑛𝑛
Note that, we cannot use this formula when all the outcomes are not equally likely,
or the total number of possible outcomes, 𝑛𝑛, is infinite.
Example: An unfair dice is tossed 1000 times. “6” occurred 400 times, Then,
400
𝑃𝑃(6) ≈ = 0.40
1000
* We have not followed set notation to write the event ‘6’. Some good books have
also done that for comfort.
Example: Consider the statement: “There is a 90% chance (probability 0.90) that I’ll
get an ‘A’ in this course.” Here, the probability 0.90 shows the judgement of the
‘subject’ (the person who made the statement).
1. 0 ≤ 𝑃𝑃(𝐴𝐴) ≤ 1.
2. 𝑃𝑃(𝑆𝑆) = 1.
Explanation of Axiom 3:
3
Let a fair die be thrown. Let 𝐴𝐴 = {1, 2, 3} and 𝐵𝐵 = {4, 5}. Then, 𝑃𝑃(𝐴𝐴) = and
6
2 5
𝑃𝑃(𝐵𝐵) = . Also, 𝐴𝐴 ∪ 𝐵𝐵 = {1, 2, 3, 4, 5}, so that 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = = 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐵𝐵). If 𝐴𝐴
6 6
and 𝐵𝐵 are not mutually exclusive, equality will not hold.
Extension of Axiom 3
𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷 ∪ 𝐸𝐸 )
= 𝑃𝑃�(𝐶𝐶 ∪ 𝐷𝐷) ∪ 𝐸𝐸�
= 𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷) + 𝑃𝑃(𝐸𝐸 )
= 𝑃𝑃(𝐶𝐶 ) + 𝑃𝑃(𝐷𝐷) + 𝑃𝑃(𝐸𝐸)
Theorem 1: 𝑃𝑃(∅) = 0.
Proof:
𝐴𝐴 ∪ ∅ = 𝐴𝐴
∴ 𝑃𝑃(𝐴𝐴 ∪ ∅) = 𝑃𝑃(𝐴𝐴)
∴ 𝑃𝑃(∅) = 0.
Theorem 2: 𝑃𝑃(𝐴𝐴𝑐𝑐 ) = 1 − 𝑃𝑃(𝐴𝐴)
Proof:
𝐴𝐴 ∪ 𝐴𝐴𝑐𝑐 = 𝑆𝑆
∴ 𝑃𝑃(𝐴𝐴𝑐𝑐 ) = 1 − 𝑃𝑃(𝐴𝐴)
Proof:
A B
C D E
𝐴𝐴 ∪ 𝐵𝐵 = 𝐶𝐶 ∪ 𝐷𝐷 ∪ 𝐸𝐸
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵)
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵)
In a community, 25% of the families have cars, 15% have washing machines and
10% have both. A family is selected at random from the community. What is the
probability that the family has (i) a car or a washing machine? (ii) neither a car nor
a washing machine?
Solution:
𝑃𝑃(𝐶𝐶 ) = 0.25
𝑃𝑃(𝑊𝑊 ) = 0.15
𝑃𝑃(𝐶𝐶 ∩ 𝑊𝑊 ) = 0.10