Stat 1101 4 7

Notation
We use an upper-case letter to denote a variable, and the corresponding lower-case

letter to denote a general value of the variable. For example, when 𝑋𝑋 is used to denote
a variable, 𝑥𝑥 is used to denote its particular value.
Suppose, our sample consists of 𝑛𝑛 values of a variable 𝑋𝑋. We use ∑ 𝑥𝑥 to denote the
sum of all these values.
Describing the center of the data
There are three common measures to describe the central tendency of the sample
data. These are:
1. Mean
2. Median
3. Mode
Mean
Let a sample of 𝑛𝑛 values of variable 𝑋𝑋 be taken. Data: 𝑥𝑥1 , 𝑥𝑥2 , ⋯ , 𝑥𝑥𝑛𝑛 . The sample
mean is defined as
𝑛𝑛
1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Example
Data: 4, 8, 5, 9, 15
1 1
𝑥𝑥̅ = � 𝑥𝑥 = (4 + 8 + 5 + 9 + 15) = 8.2
𝑛𝑛 5
Median
It is the middlemost value in the sorted data. If 𝑛𝑛 is an odd number, median is the
middle value of the sorted data. If 𝑛𝑛 is an even number, median is the average of
the two middle values.
13
When sample size is large, approximately 50% values are less (more) than the
median.
Example
Data: 4, 8, 5, 9, 15
Sorted data: 4, 5, 8, 9, 15
Median = 8
Example:
Data: 4, 8, 5, 9, 15, 13
Sorted data: 4, 5, 8, 9, 13, 15

1
Median = (8 + 9) = 8.5
2
Mode
Mode is the value that occurs most frequently.

Sometimes two or more values occur with highest frequency.
• If there are two modes, the data is bimodal.
• If there are more than two modes, the data is multimodal.
If all values occur with equal frequency, there is no mode.
Example:
20 people were asked to give satisfaction rating after a restaurant meal on a scale of
1 (not satisfied) to 10 (extremely satisfied).
Data: 9, 3, 7, 5, 5, 10, 8, 9, 9, 10, 9, 8, 9, 6, 9, 8, 7, 7, 10, 6.
Mode = 9 (occurred 6 times in the data)
Which measure to use when
For categorical data, mode can be used.
14
For numerical (discrete or continuous) data, any of the three measures can be used.
However, for mathematical reasons, mean or median is preferred.
Center for numerical data: mean or median?
Data: 2, 3, 4, 5, 7
Here, mean = 4.2, median = 4.
(Results are close. Mean is preferred because it is easy to calculate and

mathematically solid.)
Data: 2, 3, 4, 5, 507 (the last value is an ‘outlier’)
Here, mean = 104.2, median = 4.
Median represents the majority of the data. Mean represents neither the majority,
nor the outlier. Median is preferred because it gives reasonable result.
• When data have outliers, median is preferred.
Relation between mean, median and mode
For symmetric bell-shaped distribution:

mean = median = mode (shown with a bullet point in the plot below).
For positively skewed distribution:

mean > median > mode (shown with 3 bullet points in the plot below).
15
For negatively skewed distribution:
mean < median < mode (shown with 3 bullet points in the plot below).
Exercise
Consider the data: 2, 4, 10, 10, 12, 6, 11, 12, 12, 8. Compute mean, median and
mode. Comment on the shape of the distribution.
Solution
Sorted data: 2, 4, 6, 8, 10, 10, 11, 12, 12, 12.
Mean = 8.7
Median = (10 + 10)/2 = 10
Mode = 12
Since Mean < Median < Mode, the distribution is negatively skewed (or skewed to
the left).
Percentiles
When data are arranged in increasing order, the 𝑝𝑝th percentile is a value such that 𝑝𝑝
percent of the values fall at or below the value, and (100 − 𝑝𝑝) percent of the values
fall at or above the value. There are 99 percentiles that divide the total area of the
histogram in 100 equal parts.
Example
Let the 83rd percentile = 17.5. This means 83% values in the data are less than 17.5,
and (100 – 83) % = 17% values are more than 17.5.
16
Quartiles
There are three quartiles that divide the total area of the histogram in 4 equal parts.
The first quartile Q1 is the 25th percentile. The second quartile Q2 (or median) is
the 50th percentile. The third quartile Q3 is the 75th percentile.
𝑄𝑄1 𝑄𝑄2 𝑄𝑄3
Example
20 customers’ satisfaction ratings:

5, 1, 7, 3, 5, 10, 10, 9, 8, 8, 10, 8, 8, 9, 9, 8, 8, 10, 9, 9.
Sorted data:
1, 3, 5, 5, 7, 8, 8, 8, 8, 8, 8, 9, 9, 9, 9, 9, 10, 10, 10, 10.
Median = (8+8)/2 = 8
Q1 = (7+8)/2 = 7.5
Q3 = (9+9)/2 = 9
Five-number summary
We often describe a set of data by using a five-number summary. The summary
consists of (1) minimum (the smallest value) (2) the first quartile Q1 (3) the median
(4) the third quartile Q3 and (5) maximum (the largest value).
Example
The five-number summary of the previous data: 1, 7.5, 8, 9, 10.
17
Variation in the data
Consider the following two datasets:
1st dataset: 49, 50, 51
2nd dataset: 0, 50, 100
Both datasets have the same mean: 50, but the 2nd dataset has more variability. We
will discuss how to measure variability.
Measures
(a) Range
(b) Mean deviation from mean
(c) Variance and Standard deviation
Range
Range = largest value − smallest value.
Example
Data: 2, 4, 8, 5
Range = 8 – 2 = 6
• Range is not very useful. It only gives a rough idea about the variation.
Mean deviation from mean

𝑛𝑛
1
MD = � |𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ |
𝑛𝑛
𝑖𝑖=1
Note
For any data set
𝑛𝑛
�(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ) = 0
𝑖𝑖=1
18
Example
Data: 2, 3, 5, 6
𝑥𝑥̅ = 4
1
MD = (|2 − 4| + ⋯ + |6 − 4|) = 1.5
4
Variance
1
𝑠𝑠 2 = �(𝑥𝑥 − 𝑥𝑥̅ )2
𝑛𝑛 − 1
Example
Data: 2, 3, 5, 6
𝑥𝑥̅ = 4
1
𝑠𝑠 2 = �(𝑥𝑥 − 𝑥𝑥̅ )2
𝑛𝑛 − 1
1
= ((2 − 4)2 + (3 − 4)2 + (5 − 4)2 + (6 − 4)2 )
4−1
= 3.33
Note
The division is by 𝑛𝑛 − 1 because the number of free values (degrees of freedom) is
𝑛𝑛 − 1. If 𝑛𝑛 = 4, and we know 3 values of (𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ ), the 4th one can be calculated.
Standard deviation (SD)

It is the positive square-root of variance and is denoted by 𝑠𝑠.
𝑠𝑠 = �𝑠𝑠 2
Example
In the previous example, 𝑠𝑠 = √3.33 = 1.83

19
Note
Range, mean deviation, variance and SD cannot be negative.
Empirical Rule
If a distribution (histogram) appears to be symmetric and bell-shaped, we expect
that approximately
• 68% of the data values will fall in the interval (𝑥𝑥̅ − 𝑠𝑠, 𝑥𝑥̅ + 𝑠𝑠)
(within one standard deviation of the sample mean)
• 95% of the data values will fall in the interval (𝑥𝑥̅ − 2𝑠𝑠, 𝑥𝑥̅ + 2𝑠𝑠)
(within two standard deviations of the sample mean)
• 99.7% of the data values will fall in the interval (𝑥𝑥̅ − 3𝑠𝑠, 𝑥𝑥̅ + 3𝑠𝑠)
(within three standard deviations of the sample mean)
Example
Let the mean and SD of commuting time (minutes) of workers be 60 and 10,
respectively. Let the histogram be more or less symmetric and bell-shaped. We
then have:
𝑥𝑥̅ − 𝑠𝑠 = 60 − 10 = 50
𝑥𝑥̅ + 𝑠𝑠 = 60 + 10 = 70
Approximately 68% workers have commuting time between 50 and 70 minutes.
𝑥𝑥̅ − 2𝑠𝑠 = 60 − 2 × 10 = 40
𝑥𝑥̅ + 2𝑠𝑠 = 60 + 2 × 10 = 80
Approximately 95% workers have commuting time between 40 and 80 minutes.
𝑥𝑥̅ − 3𝑠𝑠 = 60 − 3 × 10 = 30
𝑥𝑥̅ + 3𝑠𝑠 = 60 + 3 × 10 = 90
Approximately 97.7% workers have commuting time between 30 and 90 minutes.
20
30 40 50 60 70 80 90
Coefficient of variation (CV)

Let there be two datasets. First set contains small values and the second set contains
large values. Comparing their standard deviations may be misleading. In order to
compare their variability, we should use CV defined as
𝑠𝑠
CV = × 100
𝑥𝑥̅
Example
The SD of a particular type of 10-mg tablets is 1 mg, while the SD of a particular
type of 50-mg tablets is 2 mg. Which type of tablets has more variability?
Solution
For 10-mg tablets
𝑠𝑠 1
CV(1) = × 100 = × 100 = 10
𝑥𝑥̅ 10
For 50-mg tablets
𝑠𝑠 2
CV(2) = × 100 = × 100 = 4
𝑥𝑥̅ 50
Therefore, 10-mg tablets have more variability.
21
Mean and variance from frequency table
Example
𝑥𝑥 Frequency
0 40
2 10
3 20
4 30
Total 100
Mean:
1
𝑥𝑥̅ = (0 × 40 + 2 × 10 + 3 × 20 + 4 × 30)
100
=2
That is,
𝑘𝑘
1
𝑥𝑥̅ = � 𝑥𝑥𝑖𝑖 𝑓𝑓𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Variance:
1
𝑠𝑠 2 = ((0 − 2)2 × 40 + (2 − 2)2 × 10 + (3 − 2)2 × 20 + (4 − 2)2 × 30)
99
= 3.03
That is,
𝑘𝑘
1
𝑠𝑠 2 = �(𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ )2 𝑓𝑓𝑖𝑖
𝑛𝑛 − 1
𝑖𝑖=1
Exercise
Calculate mean deviation (from mean) from the above frequency table.
22
Solution:
Do it yourself.
Example
Class Frequency
0−5 40
5 − 10 20
10 − 15 10
15 − 20 30
Total 100
We use mid-values of each class in our calculation.
Mean:
1
𝑥𝑥̅ = (2.5 × 40 + 7.5 × 20 + 12.5 × 10 + 17.5 × 30)
100
=9
That is,
𝑘𝑘
1
𝑥𝑥̅ = � 𝑚𝑚𝑖𝑖 𝑓𝑓𝑖𝑖
𝑛𝑛
𝑖𝑖=1
Here, 𝑚𝑚𝑖𝑖 is the mid-value of the 𝑖𝑖th class.
Variance:
1
𝑠𝑠 2 = ((2.5 − 9)2 × 40 + (7.5 − 9)2 × 20 + (12.5 − 9)2 × 10
99
+ (17.5 − 9)2 × 30)
= 40.66
23
That is,
𝑘𝑘
2
1
𝑠𝑠 = �(𝑚𝑚𝑖𝑖 − 𝑥𝑥̅ )2 𝑓𝑓𝑖𝑖
𝑛𝑛 − 1
𝑖𝑖=1
Exercise
Calculate mean deviation (from mean) from the above frequency table.
Solution
Do it yourself.
Assignment (not to be handed in)

1. Learn ‘stem and leaf plot’ from textbook. Find out its advantage and
disadvantage when it is compared to histogram.
2. Learn ‘cumulative frequency polygon’ (also called ‘ogive’) and ‘cumulative

relative frequency polygon’ from the textbook. Find out one of its uses.
24
Probability
Probability is a measure of certainty (surety) of an event. It takes a value between
‘0’ and ‘1’. For an impossible event, the probability is zero. For a sure event, the
probability is one.
Example: If a regular (fair) coin is tossed, the probability that a ‘head’ occurs is 1/2
= 0.50 (50% chance).
Example: If a fair dice is tossed, the probability that a ‘3’ occurs is 1/6 = 0.167
(16.7% chance).
It should be noted here that in old English, ‘die’ is singular and ‘dice’ is plural. In
present-day English, ‘dice’ is both singular and plural.
Deterministic and random experiments:
An experiment that has only one possible outcome is a ‘deterministic’ experiment.
Example: Counting the number of stairs of a particular building is a deterministic

experiment. If we do the experiment repeatedly, we will get the same result (if we
do not make a mistake).
A ‘random’ experiment has more than one possible outcome.
Example: Tossing a coin, throwing a dice, counting the number of calls received in
an hour, etc. are random experiments.
* Probability is associated with random experiments.
Sample space:
The set of all possible outcomes of a random experiment is called the sample space.
It is usually denoted by 𝑆𝑆. It is comparable to the universal set in set theory.
Example: In coin tossing experiment, 𝑆𝑆 = {𝐻𝐻, 𝑇𝑇}.
Example: In dice throwing experiment, 𝑆𝑆 = {1, 2, 3, 4, 5, 6}.
Example: In a cricket match, 𝑆𝑆 = {win, loss, tie, postponed, cancelled}.

Outcomes (elements) in S should be mutually exclusive (two outcomes cannot occur
together). Also, S should be exhaustive (complete, i.e., no possible outcome should
be left out).
Event:
Any subset of the sample space is called an event. When a random experiment is
going to be conducted, we are interested in the probabilities of different events.
Example: In dice throwing, 𝐴𝐴 = {1}, 𝐵𝐵 = {1, 4, 5}, etc. are events. Note that B
denotes the event that 1 or 4 or 5 occurs. They cannot happen together!
Example: In cricket match, 𝐶𝐶 = {loss}, 𝐷𝐷 = {win, tie}, etc. are events.
Events are sets. So, upper-case letters should be used for notation.
Operations on events:
𝐴𝐴 ∪ 𝐵𝐵 occurs when 𝐴𝐴 or 𝐵𝐵 (or both) occur.
𝐴𝐴 ∩ 𝐵𝐵 occurs when both 𝐴𝐴 and 𝐵𝐵 occur.
𝐴𝐴𝑐𝑐 occurs when 𝐴𝐴 does not occur.
Classical or mathematical definition of probability:
Let a random experiment have 𝑛𝑛 possible outcomes that are mutually exclusive,
exhaustive and equally likely (all the outcomes have same chance). If 𝑚𝑚 of these
outcomes are favorable to an event 𝐴𝐴, then probability of 𝐴𝐴 is given by:
𝑚𝑚
𝑃𝑃(𝐴𝐴) =
𝑛𝑛
Note that, we cannot use this formula when all the outcomes are not equally likely,
or the total number of possible outcomes, 𝑛𝑛, is infinite.
Example: Let a fair dice be thrown. Let 𝐴𝐴 = {1, 4, 5}. Then

𝑚𝑚 3
𝑃𝑃(𝐴𝐴) = = = 0.5.
𝑛𝑛 6
Example: Consider a cricket match. 𝑆𝑆 = {win, loss, tie, postponed, cancelled}.
2
Let 𝐷𝐷 = {win, tie}. Then, we should NOT say 𝑃𝑃(𝐷𝐷) = . (Why not?)
5
Empirical or statistical or frequency definition of probability
Empirical means observation-based, i.e., data-based.
Let an experiment be conducted 𝑛𝑛 times, where 𝑛𝑛 is large. If an event 𝐴𝐴 occurs 𝑓𝑓𝐴𝐴

times, then
𝑓𝑓𝐴𝐴
𝑃𝑃(𝐴𝐴) ≈
𝑛𝑛
Here, probability is approximately equal to relative frequency. We cannot use this
formula when the experiment cannot be repeated under the same conditions. Also,
𝑓𝑓𝐴𝐴
𝑃𝑃(𝐴𝐴) = lim
𝑛𝑛→∞ 𝑛𝑛
Example: An unfair dice is tossed 1000 times. “6” occurred 400 times, Then,
400
𝑃𝑃(6) ≈ = 0.40
1000
* We have not followed set notation to write the event ‘6’. Some good books have
also done that for comfort.
Subjective definition of probability
Sometimes probability is someone’s judgement or belief.
Example: Consider the statement: “There is a 90% chance (probability 0.90) that I’ll
get an ‘A’ in this course.” Here, the probability 0.90 shows the judgement of the
‘subject’ (the person who made the statement).
We cannot use mathematical definition in the above example, because ‘getting A’

and ‘not getting A’ are not equally likely. Also, we cannot use statistical definition
here, because we cannot repeat the experiment (taking the course) under the same
conditions.
Axioms of probability
Probability follows the following three axioms:
1. 0 ≤ 𝑃𝑃(𝐴𝐴) ≤ 1.
2. 𝑃𝑃(𝑆𝑆) = 1.
3. When 𝐴𝐴 and 𝐵𝐵 are mutually exclusive events, i.e., 𝐴𝐴 ∩ 𝐵𝐵 = ∅, then

𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐵𝐵).
Explanation of Axiom 3:
3
Let a fair die be thrown. Let 𝐴𝐴 = {1, 2, 3} and 𝐵𝐵 = {4, 5}. Then, 𝑃𝑃(𝐴𝐴) = and
6
2 5
𝑃𝑃(𝐵𝐵) = . Also, 𝐴𝐴 ∪ 𝐵𝐵 = {1, 2, 3, 4, 5}, so that 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = = 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐵𝐵). If 𝐴𝐴
6 6
and 𝐵𝐵 are not mutually exclusive, equality will not hold.
Extension of Axiom 3
Axiom 3 can be extended to any number of sets.
If 𝐶𝐶, 𝐷𝐷 and 𝐸𝐸 are mutually exclusive sets, then
𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷 ∪ 𝐸𝐸 )
= 𝑃𝑃�(𝐶𝐶 ∪ 𝐷𝐷) ∪ 𝐸𝐸�
= 𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷) + 𝑃𝑃(𝐸𝐸 )
= 𝑃𝑃(𝐶𝐶 ) + 𝑃𝑃(𝐷𝐷) + 𝑃𝑃(𝐸𝐸)
Some important theorems:
Theorem 1: 𝑃𝑃(∅) = 0.
Proof:
𝐴𝐴 ∪ ∅ = 𝐴𝐴
∴ 𝑃𝑃(𝐴𝐴 ∪ ∅) = 𝑃𝑃(𝐴𝐴)
∴ 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(∅) = 𝑃𝑃(𝐴𝐴) [Axiom 3]
∴ 𝑃𝑃(∅) = 0.
Theorem 2: 𝑃𝑃(𝐴𝐴𝑐𝑐 ) = 1 − 𝑃𝑃(𝐴𝐴)
Proof:
𝐴𝐴 ∪ 𝐴𝐴𝑐𝑐 = 𝑆𝑆
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐴𝐴𝑐𝑐 ) = 𝑃𝑃(𝑆𝑆)
∴ 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐴𝐴𝑐𝑐 ) = 1 [Axiom 3 for left side]
∴ 𝑃𝑃(𝐴𝐴𝑐𝑐 ) = 1 − 𝑃𝑃(𝐴𝐴)
Theorem 3: For any two sets 𝐴𝐴 and 𝐵𝐵, we have
𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐵𝐵) − 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)
Proof:
A B
C D E
𝐴𝐴 ∪ 𝐵𝐵 = 𝐶𝐶 ∪ 𝐷𝐷 ∪ 𝐸𝐸
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷 ∪ 𝐸𝐸)
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐶𝐶 ) + 𝑃𝑃(𝐷𝐷) + 𝑃𝑃(𝐸𝐸) [Axiom 3]
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵)
= 𝑃𝑃(𝐶𝐶 ) + 𝑃𝑃(𝐷𝐷) + 𝑃𝑃(𝐷𝐷) + 𝑃𝑃(𝐸𝐸 ) − 𝑃𝑃(𝐷𝐷)
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵)
= 𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷) + 𝑃𝑃(𝐷𝐷 ∪ 𝐸𝐸 ) − 𝑃𝑃(𝐷𝐷) [Axiom 3]
∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐵𝐵) − 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)

Exercise:
In a community, 25% of the families have cars, 15% have washing machines and
10% have both. A family is selected at random from the community. What is the
probability that the family has (i) a car or a washing machine? (ii) neither a car nor
a washing machine?
Solution:
𝑃𝑃(𝐶𝐶 ) = 0.25
𝑃𝑃(𝑊𝑊 ) = 0.15
𝑃𝑃(𝐶𝐶 ∩ 𝑊𝑊 ) = 0.10
(i) 𝑃𝑃(𝐶𝐶 ∪ 𝑊𝑊 ) = 0.25 + 0.15 − 0.10 = 0.30
(ii) 𝑃𝑃((𝐶𝐶 ∪ 𝑊𝑊 )′ ) = 1 − 𝑃𝑃(𝐶𝐶 ∪ 𝑊𝑊 ) = 0.70

Stat 1101 4 7

Uploaded by

Copyright:

Available Formats

Stat 1101 4 7

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Stat 1101 4 7

Uploaded by

Copyright:

Available Formats

Notation

We use an upper-case letter to denote a variable, and the corresponding lower-case

Describing the center of the data

Sorted data: 4, 5, 8, 9, 13, 15

Mode is the value that occurs most frequently.

If all values occur with equal frequency, there is no mode.

Which measure to use when

For categorical data, mode can be used.

Center for numerical data: mean or median?

Here, mean = 4.2, median = 4.

(Results are close. Mean is preferred because it is easy to calculate and

Data: 2, 3, 4, 5, 507 (the last value is an ‘outlier’)

Here, mean = 104.2, median = 4.

• When data have outliers, median is preferred.

Relation between mean, median and mode

For symmetric bell-shaped distribution:

For positively skewed distribution:

𝑄𝑄1 𝑄𝑄2 𝑄𝑄3

20 customers’ satisfaction ratings:

Mean deviation from mean

Standard deviation (SD)

In the previous example, 𝑠𝑠 = √3.33 = 1.83

Range, mean deviation, variance and SD cannot be negative.

Coefficient of variation (CV)

Here, 𝑚𝑚𝑖𝑖 is the mid-value of the 𝑖𝑖th class.

Assignment (not to be handed in)

2. Learn ‘cumulative frequency polygon’ (also called ‘ogive’) and ‘cumulative

Deterministic and random experiments:

An experiment that has only one possible outcome is a ‘deterministic’ experiment.

Example: Counting the number of stairs of a particular building is a deterministic

A ‘random’ experiment has more than one possible outcome.

* Probability is associated with random experiments.

Example: In coin tossing experiment, 𝑆𝑆 = {𝐻𝐻, 𝑇𝑇}.

Example: In dice throwing experiment, 𝑆𝑆 = {1, 2, 3, 4, 5, 6}.

Example: In a cricket match, 𝑆𝑆 = {win, loss, tie, postponed, cancelled}.

Example: In cricket match, 𝐶𝐶 = {loss}, 𝐷𝐷 = {win, tie}, etc. are events.

𝐴𝐴 ∪ 𝐵𝐵 occurs when 𝐴𝐴 or 𝐵𝐵 (or both) occur.

𝐴𝐴 ∩ 𝐵𝐵 occurs when both 𝐴𝐴 and 𝐵𝐵 occur.

𝐴𝐴𝑐𝑐 occurs when 𝐴𝐴 does not occur.

Classical or mathematical definition of probability:

Example: Let a fair dice be thrown. Let 𝐴𝐴 = {1, 4, 5}. Then

Empirical or statistical or frequency definition of probability

Empirical means observation-based, i.e., data-based.

Let an experiment be conducted 𝑛𝑛 times, where 𝑛𝑛 is large. If an event 𝐴𝐴 occurs 𝑓𝑓𝐴𝐴

Subjective definition of probability

Sometimes probability is someone’s judgement or belief.

We cannot use mathematical definition in the above example, because ‘getting A’

Probability follows the following three axioms:

3. When 𝐴𝐴 and 𝐵𝐵 are mutually exclusive events, i.e., 𝐴𝐴 ∩ 𝐵𝐵 = ∅, then

Axiom 3 can be extended to any number of sets.

If 𝐶𝐶, 𝐷𝐷 and 𝐸𝐸 are mutually exclusive sets, then

Some important theorems:

∴ 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(∅) = 𝑃𝑃(𝐴𝐴) [Axiom 3]

∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐴𝐴𝑐𝑐 ) = 𝑃𝑃(𝑆𝑆)

∴ 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐴𝐴𝑐𝑐 ) = 1 [Axiom 3 for left side]

Theorem 3: For any two sets 𝐴𝐴 and 𝐵𝐵, we have

𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐴𝐴) + 𝑃𝑃(𝐵𝐵) − 𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)

∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐶𝐶 ∪ 𝐷𝐷 ∪ 𝐸𝐸)

∴ 𝑃𝑃(𝐴𝐴 ∪ 𝐵𝐵) = 𝑃𝑃(𝐶𝐶 ) + 𝑃𝑃(𝐷𝐷) + 𝑃𝑃(𝐸𝐸) [Axiom 3]