Foundation Probability (Lecture Notes)
Foundation Probability (Lecture Notes)
Foundation Probability (Lecture Notes)
i
STA211: Foundations of Probability and Statistics i
Preface
This module presents simplified notes for an introductory course in probability and statistics
for university learners. It targets undergraduate degree students in science and mathematics,
quantitative social sciences, engineering, finance, and related fields. The presentation of material
is done in such a way that it maximises self-learning by students. A lot of examples have been
given to support the presented concepts. However, to fully understand the material, students
are expected to have a good background in elementary mathematics courses such as calculus.
All calculations in this module can be done by hand, with the aid of just a scientific calculator.
However, computer apps may be used where necessary for purposes of speed of calculations.
The content is organised in units. There are seven units, each of which is divided into sessions,
that are further organised into topics of study. A session is essentially a lesson or lecture. For
a university student with the required aptitude, a session is designed to last for 60 minutes. At
the end of each session, there are exercises for students’ practice. Each unit has also references
to motivate further learning by students. The units, sessions, as well as topics in this module are
presented in a sequential manner, such that to understand content of each session, a learner is
advised to first study the preceding session(s). Instructors who may use these lecture notes are
therefore advised not to skip sessions or topics to avoid inducing knowledge gap in learners.
The module first revisits the material from set theory as it is essential for learning of the rest
content. Later, the concept of probability and random variable is presented. Learners will benefit
from the various proofs that are provided for the given theorems in probability, which will help
in acquiring knowledge for proving theorems in statistics. These follow from basic techniques
in logic such as proof by mathematical induction and proof by counterexample. A number of
commonly used standard random variables have been presented together with their properties
to prepare learners for future courses in statistics, such as regression modelling.
i
STA211: Foundations of Probability and Statistics i
1 UNIT 1: REVIEW OF SET THEORY
This unit revisits concepts of set theory, that are useful in studying probability theory.
Introduction
In this session the concept of a set is reviewed. Students will also be reminded of the ideas of
members of a set called elements. Various mathematical symbols studied in MAT100 are used.
Where a proof of a theorem is given, the symbol shall mean end of proof.
Objectives of Session 1
After studying this session, you should be able to:
A set is a well-defined list or collection of objects. An object that belongs to a particular set is
called an element or a member of that set.
Set notation
In most cases, uppercase letters of alphabet (i.e. A,B,M,X,...) are used to denote sets, while
lower cases letters (i.e. a,b,x,...) for members of sets. Let A denotes a set and k its element,
then the expression k ∈ A is read as ”k is an element of A”. If some item p is not a member of
A, it is written as p ∈
/ A.
Set examples
To specify that certain objects belong to a given set, the braces, {} (called set builders) are used,
by either providing a roster, i.e. complete list of all elements in the braces or the rule method,
i.e. stating properties that characterize the elements, within the braces. For example,
(b) B = {x : x is a prime number, x < 13} means B is a set of prime numbers less than 13,
(c) 17 ∈
/ A means that 17 is not a member of set A.
1
STA211: Foundations of Probability and Statistics 1
1.1 Session 1: Sets and Elements 1 UNIT 1: REVIEW OF SET THEORY
1.1.2 Subset
Let A and B be any two sets. If every element of A also belongs to set B, i.e. if p ∈ A implies
p ∈ B, then A is called a subset of B or is said to be contained in B.
Two different types of subsets emerge from this definition of subset; proper and improper subsets.
A is a proper subset of B if it is a subset of B and there exists at least one element of B that
does not belong to A; otherwise A is an improper subset of B.
Subset notation
To show that A is a subset of B, we write it as A ⊂ B.
Subset examples
Let N = {1, 2, 3, ...} represents a set of positive integers, Z = {..., −2, −1, 0, 1, 2, ...} represents
a set of integers, and R = {x : x is a real number}. Then N ⊂ Z ⊂ R.
A subset of real numbers can also be defined as an interval on the real line. For example,
(a) (a, b) = {x : a < x < b, x ∈ R} means an open interval of real numbers from a to b,
Two sets are equal if each is contained in the other, i.e. let A and B be any two sets, then if
x ∈ A =⇒ x ∈ B and x ∈ B =⇒ x ∈ A, we have A = B. In a nutshell, A = B if and only if
A ⊂ B and B ⊂ A.
Whwn in a particular discussion we have all the sets under analysis, i.e. A, B, C, etc being subsets
of one set that we denote by U , then U is called the universal set.
A null or empty set, denoted by ∅ is the set with no elements, i.e. ∅ = {}.
2
STA211: Foundations of Probability and Statistics 2
1.2 Session 2: Set operations 1 UNIT 1: REVIEW OF SET THEORY
An empty set is a subset of every other set A, i.e. ∅ ⊂ A. This is the case because ”every x ∈ ∅
(there are none) also belongs to A” is a true statement for any set A since there is no x ∈ ∅ to
make the statement false.
Theorem
Let A, B, and C be any sets. Then,
(a) A ⊂ A
(b) if A ⊂ B and B ⊂ A, then A = B, and
(c) if A ⊂ B and B ⊂ C, then A ⊂ C.
Proof
(a) Let x ∈ 1st A, then by uniqueness of sets x ∈ 2nd A. Hence by definition of subset,
A⊂A
(b) Let x ∈ A. Then A ⊂ B =⇒ x ∈ B. Likewise, B ⊂ A =⇒ if y ∈ B then y ∈ A. Now,
we have x ∈ A =⇒ x ∈ B and y ∈ B =⇒ y ∈ A. Hence x = y, ∴ A = B
(c) Let x ∈ A, then A ⊂ B =⇒ x ∈ B. While B ⊂ C =⇒ x ∈ C. ∴ A ⊂ B and B ⊂ C
means x ∈ A =⇒ x ∈ B and x ∈ C. Hence, A ⊂ B ⊂ C, or A ⊂ C.
Session 1 Exercises
Introduction
This session presents the opertaions of union, intersection, complement, difference, and Catersian
product of two sets. Students will be reminded of various operations of addition and multiplica-
tion with numbers, in which applying the operation on any pair of numbers resulted in another
number.
Objectives of Session 2
After studying this session, you should be able to:
3
STA211: Foundations of Probability and Statistics 3
1.2 Session 2: Set operations 1 UNIT 1: REVIEW OF SET THEORY
1.2.1 Union
Let A and B be any sets. The union of A and B, denoted by A ∪ B, is the set that consists of
all the elements that belong to A or to B or to both, i.e. A ∪ B = {x : x ∈ A or x ∈ B}.
Example of Union
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Then A ∪ B = {1, 2, 3, 4, 5, 6}.
1.2.2 Intersection
Let A and B be any sets. The intersection of A and B, denoted by A ∩ B, is the set of elements
which belong to both A and B, i.e. A ∩ B = {x : x ∈ A and x ∈ B}.
If A ∩ B = ∅, that is, if A and B do not have any elements in common, then A and B are said
to be disjoint.
Example of Intersection
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Then A ∩ B = {3, 4}.
1.2.3 Complement
Let A be any set and U the universal set. The complement or absolute complement of A, denoted
by A{ , is the set of elements which do not belong A, i.e. A{ = {x : x ∈ U, x ∈
/ A}. In essence,
A{ is the difference of the universal set U and A.
Example of Complement
Let A = {1, 2, 3, 4} and U = {1, 2, 3, ...}. Then A{ = {5, 6, 7, ...}.
1.2.4 Difference
Let A and B be any sets. The difference of A and B or the relative complement of B with
respect to A, denoted by A\B, is the set of elements which belong to A but not to B, i.e.
A\B = {x : x ∈ A , x ∈/ B}.
Example of Difference
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Then A\B = {1, 2}.
4
STA211: Foundations of Probability and Statistics 4
1.2 Session 2: Set operations 1 UNIT 1: REVIEW OF SET THEORY
The symmetric difference of the sets A and B, denoted by A ⊕ B is the set consisting of those
elements which belong to A or B, but not both, i.e. A ⊕ B = (A ∪ B)\(A ∩ B) or A ⊕ B =
(A\B) ∪ (B\A).
The Venn (or Euler) diagrams are frequently useful in picturing sets and relationships between
sets. These diagrams use geometric shapes to represent sets (the actual shape used has no real
bearing). Below is an example of a Venn diagram, with shaded region representing A ∩ C.
B C
Definintion An n-tuple is an ordered array of n components written (x1 , x2 , ..., xn ). (1, 2), (0, 100), (a, b)
are examples of 2-tuples, while (1, 1, 1), (a, c, b), (2, 1, 2) are cases of 3-tuples.
Definition Let A and B be two sets. The product set or Catersin product of A and B, denoted
by A × B, is a set of all possible 2-tuples (a, b), where a ∈ A and b ∈ B, i.e. A × B = {(a, b) :
a ∈ A, b ∈ B}.
The product of a set with itself, say A × A, is denoted by A2 . The concept of product set is
extended to any finite number of sets in a natural way. The product set of the sets A1 , A2 , ..., Am ,
written A1 × A2 × ... × Am , is the set of all ordered m-tuples (a1 , a2 , ..., am ) where ai ∈ Ai for
each i.
(a) Let A = {1, 2, 3} and B = {a, b}. Then A × B = {(1, a), (1, b), (2, a), (2, b), (3, a), (3, b)}
(b) The cartesian plane R2 = RXR has each point P that represents an ordered pair (a, b)
5
STA211: Foundations of Probability and Statistics 5
1.3 Session 3: Properties of set operations 1 UNIT 1: REVIEW OF SET THEORY
of real numbers, with a representing a value in the horizontal X-axis, and b representing
values in vertical Y -axis. Hence, every region of the Catersian plane is a product set.
Session 2 Exercises
1. Let U = {1, 2, 3, ..., 9}, A = {1, 2, 3, 4}, B = {2, 4, 6, 8} and C = {3, 4, 5, 6}. Find (i) A{ ,
(ii) A ∩ C, (iii) (A ∩ B){ , (iv) A ∪ B, (v) B\A
2. Let U = {a, b, c, d, e}, A = {a, b, d}, B = {b, d, e}. Find (i) A ∪ B, (ii) B { , (iii) A{ ∩ B,
(iv) A{ ∩ B { , (v) (A ∩ B){ , (vi) B ∩ A, (vii) B\A, (viii) A ∪ B { , (ix) B { \A{ , (x) (A ∪ B){
3. Prove that B\A = B ∩ A{
4. Prove that A ⊂ B if and only if A ∩ B = A
5. Let A = {1, 2}, A = {2, 1} and C = {10, 12}. Find A × B, B × A, A × A, A × B × C.
Introduction
This session introduces learners to various mathematical laws governing the set operations. The
laws form building blocks for most mathematical operations in the topics that follow.
Objectives of Session 3
After studying this session, you should be able to:
1. A ∪ A = A
2. A ∩ A = A
1. (A ∪ B) ∪ C = A ∪ (B ∪ C)
2. (A ∩ B) ∩ C = A ∩ (B ∩ C)
1. A ∪ B = B ∪ A
2. A ∩ B = B ∩ A
6
STA211: Foundations of Probability and Statistics 6
1.3 Session 3: Properties of set operations 1 UNIT 1: REVIEW OF SET THEORY
1. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)
2. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)
1. A ∪ ∅ = A
2. A ∩ U = A
3. A ∪ U = U
4. A ∩ ∅ = ∅
1. A ∪ A{ = U
2. A ∩ A{ = ∅
3. (A{ ){ = A
4. U { = ∅
5. ∅{ = U
1. (A ∪ B){ = A{ ∩ B {
2. (A ∩ B){ = A{ ∪ B {
The above laws follow from laws of logic. For example, the proof of DeMorgan’s law 1), which
can go as follows: (A ∪ B){ = {x : x ∈ / (A or B)} = {x : x ∈/ A and x ∈/ B} = A{ ∩ B { is
equivalent to logical law ¬(p ∨ q) = ¬p ∧ ¬q.
Session 3 Exercises
7
STA211: Foundations of Probability and Statistics 7
1.4 Session 4: Finite Sets, Classes of Sets 1 UNIT 1: REVIEW OF SET THEORY
Introduction
In this session learners will be introduced to some descriptions of set based on its size or number
of elements.
Objectives of Session 3
After studying session unit, you should be able to:
Finite set
A set A is finite if A is empty or if A consists of exactly m elements, where m is a positive
integer; otherwise A is infinite.
3. Let Y be the set of (positive) even integers, i.e. Y = {2, 4, 6, ...}. Then Y is an infinite set.
4. Let I be the unit interval of real numbers, i.e. I = {x : 0 ≤ x ≤ 1}. Then I is also an
infinite set.
Countable set
A set A is countable if A is finite or if its elements can be arranged in the form of a sequence, in
which case A is said to be countably infinite; otherwise A is uncountable.
The set in example 3 above is countably infinite, whereas it can be shown that the set in example
4 above is uncountable.
Lemma
8
STA211: Foundations of Probability and Statistics 8
1.4 Session 4: Finite Sets, Classes of Sets 1 UNIT 1: REVIEW OF SET THEORY
(a) Suppose A and B are disjoint sets. Then A∪B is finite, and hence n(A∪B) = n(A)+n(B)
(d) Suppose A and B are finite. Then A × B is finite and n(A × B) = n(A) × n(B).
The above theorem can be generalised to any finite number of finite sets. For example, if A, B,
C are finite sets, then A ∪ B ∪ C is finite and
n(A ∪ B ∪ C) = n(A) + n(B) + n(C) − n(A ∩ B) − n(A ∩ C) − n(B ∩ C) + n(A ∩ B ∩ C).
Solution: a) Let M be a list of students that study mathematics and E be the list for English,
then n(M \E) = n(M ) − n(M ∩ E) = 30 − 20 = 10. b) n(M ∪ E) = n(M ) + n(E) − n(M ∩ E) =
30 + 35 − 20 = 45. c) We will use symmetric difference set to answer part c), i.e. n(M ⊕ E) =
n((M \E) ∪ (M \E)) = 10 + 15 = 25.
Members of a set can be sets themselves. To help clarify these situations, we usually use the
word class or family for such a set. The words subclass and subfamily have meanings analogous
to subset.
Examples of class of subsets Let S = {{1, 2, 3, 4}. Let W be the class of subsets of S which
contains exactly three elements of S. Then W = [{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}]
Consider any set A, the power set of A, denoted by P(A), is the class of all subsets of A. In
general, if A is finite, so is P(A). The number of elements in P(A) is 2 raised to the power n(A),
i.e. n(P(A)) = 2n(A) .
9
STA211: Foundations of Probability and Statistics 9
1.5 Unit 1 Summary 1 UNIT 1: REVIEW OF SET THEORY
1.4.4 Partitions
Examples of partition
Consider the following classes of subsets of X = {1, 2, 3, ..., 9}: (i) [{1, 3, 5}, {2, 6}, {4, 8, 9}], (ii)
[{1, 3, 5}, {2, 4, 6, 8}, {5, 7, 9}], (iii) [{1, 3, 5}, {2, 4, 6, 8}, {7, 9}].
Then (i) is not a partition of X since 7 ∈ X but 7 does not belong to any of the cells. Furthermore,
(ii) is not a partition of X since 5 ∈ X and 5 belongs to both {1, 3, 5} and {5, 7, 9}. On the other
hand, (iii) is a partition of X since each element of X belongs to exactly one cell.
Session 4 Exercises
1. Consider the set A = [{1, 2, 3}, {4, 5}, {6, 7, 8}]. Find (i) the elements of A; (ii) n(A).
1. Solving problems related to sets, based on definitions, some operations and properties of
sets
2. Proving some theorems using definitions related to sets and set operations
2. Larson, H.J. (1982). Introduction to probability theory and statistical inference, 3rd ed.
New York: John Wiley and Sons.
3.
4. Panik, M.J. (2005). Advanced statistics from an elementary point of view. Amsterdam:
Elsevier.
10
STA211: Foundations of Probability and Statistics 10
2 UNIT 2: PROBABILITY
2 Unit 2: Probability
This unit introduces learners to the concept of probability, a quantitative measure for uncertain
events. Learners will solve various probability problems using definitions and theorems related
to probability theory.
Introduction
In this session, an axiomatic definition of probability, that is based on set theory, is discussed.
Objectives of Session 5
After studying this session, you should be able to:
Definition An random experiemnt is any sort of operation whose outcome cannot be predicted
in advance with certainty. Examples include: flipping a coin and observing which face lands on
top; and planting a particular hybrid corn on a given plot on ground and observing its yield.
Sample space
The sample space for an experiment is the set of all possible outcomes that might be observed. In
other words, a sample space, denoted by S is the universal set pertinent to a given experiment.
Event
An event is a subset of a sample space. It is a set of basic outcomes. The event consisting of a
single point or outcome a ∈ S is called an elementary event.
The empty set ∅ and S are subsets of S, and hence they are events. The event ∅ is called the
impossible or null event, while S is called the certain or sure event.
11
STA211: Foundations of Probability and Statistics 11
2.1 Session 5: Sample space, Event, Probability 2 UNIT 2: PROBABILITY
(a) A ∪ B is the event that occurs iff A occurs or B occurs (or both)
(b) A ∩ B is the event that occurs iff A occurs and B occurs
(c) A{ is the event that occurs iff A does not occur.
Events A and B are called mutually exclusive if they are disjoint, i.e. if A ∩ B = ∅. Three or
more events are mutually exclusive if every two of them are mutually exclusive.
Let S be a sample space, let E be the class of all events, and let P be a real-valued function
defined on E. Then P is called a probability function, and P (A) is called the probability of event
A, when the following axioms (also called Kolmogorov axioms for probality function) hold:
When P does satisfy the above axioms, the sample space S will be called a probability space.
Theorem
The impossible event has probability zero, i.e. P (∅) = 0.
Proof : For any event A on sample space S, we have A ∪ ∅ = A, where A and ∅ are disjoint.
Then by axiom 3, P (A) = P (A ∪ ∅) = P (A) + P (∅). It follows that P (∅) = P (A) − P (A) = 0..
Theorem
If A ⊂ B, then P (A) ≤ P (B)
Proof : If A ⊂ B, then B can be decomposed into mutually exclusive events A and B\A. Then,
A ∪ B\A = B. Thus, P (A ∪ B\A) = P (B) =⇒ P (A) + P (B\A) = P (B) (axiom 3). Since
P (B\A) ≥ 0, then P (A) ≤ P (B). .
12
STA211: Foundations of Probability and Statistics 12
2.2 Session 6: Finite Probability Spaces 2 UNIT 2: PROBABILITY
Theorem
If A and B are any events, then P (A\B) = P (A) − P (A ∩ B).
Proof : Decompose A into mutually exclusive events A\B and A∩B, so that A = (A\B)∪(A∩B).
Then by axiom 3, P (A) = P ((A\B) ∪ (A ∩ B)) = P (A\B) + P (A ∩ B) =⇒ P (A\B) =
P (A) − P (A ∩ B). .
Proof : The event A ∪ B can be decomposed into disjoint events of A\B and B, so that A ∪ B =
(A\B)∪B. It follows by axiom 3 that P (A∪B) = P ((A\B)∪B) = P (A\B)+P (B). By previous
theorem, P (A ∪ B) = P (A) − P (A ∩ B) + P (B), ∴ P (A ∪ B) = P (A) + P (B) − P (A ∩ B). .
Just like the inclusion-exclusion principle for sets apply to any finite number of sets, the addition
rule for probability can be extended by induction to any finite number of events. For instance
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) + P (B ∩ C) + P (A ∩ B ∩ C).
Session 5 Exercises
1. Given S = {1, 2, 3}, A = {1}, B = {3}, C = {2}, P (A) = 13 , P (B) = 13 . Find (a)P (C),
(b)P (B ∪ C), (c)P (A{ ∩ B { )
2. Suppose A and B are any events on a sample space and P (A{ ∩ B) = 0.1, P (A ∩ B { ) = 0.4,
P (A ∩ B){ = 0.6. Compute (a)P (A), (b)P (B), (c)P (A ∪ B), (d)P (A{ ∪ B).
3. Let a coin and a die be tossed. Specify the sample space. Define the event A with outcomes
of heads and an even number, and event B with outcomes of a number less than 3.
Introduction
In this session, further linkage of probability to events on sample space is presented.
Objectives of Session 6
After studying this session, you should be able to:
Let S be a finite sample space, say S = {a1 , a2 , ..., an }, a finite probability space is obtained
by assigning to each point ai ∈ S a realP number pi , called the probability of ai , satisfying the
following properties: (a) pi ≥ 0; and (b) ni=1 pi = 1.
13
STA211: Foundations of Probability and Statistics 13
2.2 Session 6: Finite Probability Spaces 2 UNIT 2: PROBABILITY
The probability
P P (A) of an event A is defined as the sum of the probabilities of the points in A,
i.e. P (A) = ai ∈A P (ai ).
Sometimes the points in a finite sample space S and their assigned probabilities are given in the
form of a table. Such a table is called a probability distribution.
Example
A die has been loaded in a manner such that the probability of face i being uppermost, when it
stops rolling, is proportional to i, i = 1, 2, ..., 6. Specify the probability space.
Solution: The sample space is S = {1, 2, 3, 4, 5, 6}, Let’s define the six distinct single-element
events by Ai = {i}, i = 1, 2, 3, 4, 5, 6. Let p = P (A1 ), then we have P (A2 ) = 2p, P (A3 ) = 3p,...,
1 1
P (A6 ) = 6p. It follows that p + 2p + 3p + 4p + 5p + 6p = 1, hence p = 21 . ∴ P (1) = 21 , P (2) =
2 3 4 5 6
21
, P (3) = 21 , P (4) = 21 , P (5) = 21 , P (6) = 21 and the probability function is completely
specified.
Suppose S is a finite probability space with k elements. If the k single-element events are equally
likely, then their common probability must equal k1 . Then, the probability space is called a finite
equiprobable space.
Further, if A ⊂ S is any event containing r single-element events whose union is A, then the r
points have probability kr . In other words P (A) = n(A)
n(S)
.
Theorem
n(A)
Let S be a finite sample space and, for any A ⊂ S, let P (A) = n(S)
. Then P satisfies the three
axioms for a probability function.
Proof
n(S) k
If S has k elements, then n(S) = k. P (S) = n(S)
= k
= 1, (axiom 2 satisfied). If A ⊂ S,
n(A) n(A)
n(A) ≥ 0 for all A ⊂ S. Then k
= n(S)
= P (A) ≥ 0 for all A ⊂ S (axiom 1 satisfied). If
n(A∪B) n(A) n(B)
A, B ⊂ S and A ∩ B = ∅, then n(A ∪ B) = n(A) + n(B). Hence n(S)
= n(S)
+ n(S)
. Hence,
P (A ∪ B) = P (A) + P (B) (axiom 3 satisfied) .
(b) Three horses A, B, C are in a race; A is twice as likely to win as B, and B is twice likely
to win as C. Find: i) their respective probabilities of winning, i.e. P (A), P (B), P (C); ii)
probability of B or C winning.
Solution: i) Let P (C) = p. Then P (B) = 2p and P (A) = 2P (B) = 2(2p) = 4p. The sum of
the probabilities must be 1. Hence, p + 2p + 4p = 1 =⇒ p = 71 . Therefore P (A) = 4p = 74 ,
14
STA211: Foundations of Probability and Statistics 14
2.3 Session 7: Infinite sample spaces 2 UNIT 2: PROBABILITY
Session 6 Exercises
1. Suppose a die is tossed once. Find the probability that: (i) an even number appears on
top; (ii) a number greater than 4 appears on top
2. A bag contains 4 white marbles, 3 red marbles, and 5 blue marbles. A marble is drawn at
random from the bag, what is the probability that it is red.
3. A trick coin is to be flipped one time. The probability of getting a head is three times
as large as the probability of getting a tail. What are the probabilities for the two single-
element events?
Introduction
In this session, probability problems based on infinite sample space are discussed.
Objectives of Session 7
After studying this session, you should be able to:
For infinite sample space S, two cases arise: either S is countably infinite or S is uncountable.
The examples of probability problems we have had so far are all based on a discrete sample
space. This section will look at cases of continuous sample space.
The key thing here is that a discrete sample space S may contain a finite or an infinite number
of elements. The elements in this case are isolated points on the real line, there are other points
on the real line, between any two elements of S, which do not belong to S. With discrete sample
spaces, it is meaningful to specify the probabilities for single-element events.
15
STA211: Foundations of Probability and Statistics 15
2.3 Session 7: Infinite sample spaces 2 UNIT 2: PROBABILITY
Definition: A sample sapce S that has as elements all the points in an interval or union of
intervals on the real line is called continuous. E.g. S = {x : 0 ≤ x ≤ 10}; S = {x : 0 ≤ x ≤ 1}
or S = {x : 2 ≤ x ≤ 3}.
A continuous sample space always has an infinite number of elements. If x ∈ S and y ∈ S, then
all the points between x and y also belong to S, if x and y are selected from one of the intervals
belonging to S.
L(A)
1. Since length of any interval is nonnegative, we have P (A) = L(S)
≥ 0, hence axiom 1 is
satisfied
L(A1 )+L(A2 )+L(A3 )+... L(S)
2. P (S) = L(S)
= L(S)
= 1, i.e. axiom 2 is satisfied
Since the length of any point is zero, this rule also gives P (A) = 0 if A = {x}, x ∈ S is a
single-element event.
The probability definitions given on uncountable sample space on real line also apply to the
sample spaces on a geometrical region or a region on a Cartesian plane. In that case, probabilities
are defined on measurements such as area or volume of the region, i.e P (A) = area of A
area of S
or
volume of A
P (A) = volume of S .
Examples
(a) John is a 2-year-old boy. From his family history, let’s assume that his adult height is equally
likely to lie between 169 cm and 174 cm. What is the probability that a) he will be at least 172
cm tall as an adult; b) he will have a height between 170 cm and 171 cm?
(b) A point is chosen at random inside a rectangle measuring 3 by 5 m. Find the probability
that the point is at least 1 m from the edge.
Solution: Let S denote the set of points inside the rectangle and A the points that are at least
1 m from the edge. Then, A is a 1 by 3 m rectangle. Hence, P (A) = area of A
area of S
= 1×3
3×5
= 15 .
Session 7 Exercises
16
STA211: Foundations of Probability and Statistics 16
2.4 Unit 2 Summary 2 UNIT 2: PROBABILITY
1. Suppose you daily ride a commuter train from your home in Blantyre to your work place in
Lilongwe. The station you leave from has trains leaving for Lilongwe at 07:00am, 07:13am,
07:20am, 07:25am, 07:32am, 07:45am and 07:55am. It is your practice to take the first
train that leaves after your arrival at the station. Suppose you are equally likely to arrive
at the station any instant between 07:15am and 07:45am. On a particular day, what is the
probability that you have to wait less than 5 minutes at the station? Suppose the 07:25am
and 07:45am trains are express, what is the probability that you catch an express on a
given day?
2. A 15-cm ruler is broken into 2 pieces at a random point along its length. What is the
probability that the longer piece is at least twice the length of the shorter piece?
2. Larson, H.J. (1982). Introduction to probability theory and statistical inference, 3rd ed.
New York: John Wiley and Sons.
3. Panik, M.J. (2005). Advanced statistics from an elementary point of view. Amsterdam:
Elsevier.
17
STA211: Foundations of Probability and Statistics 17
3 UNIT 3: CONDITIONAL PROBABILITY AND INDEPENDENCE
This unit presents techniques for computing probability problems of events that depend or do
not depend on each other.
Introduction
In this session, a review of methods for counting number of elements for events in a finite sample
space is given.
Objectives of Session 8
After studying this session, you should be able to:
Use permutation to solve problems related to number of elements in finite sample space
Examples
(a) If a minibus can travel from Blantyre to Lilongwe in 3 ways and from Lilongwe to Mzuzu in
4 ways. Then, the minibus can travel from Blantyre to Mzuzu in a total of 3 × 4 = 12 ways.
(b) If the operation of tossing a die gives rise to 1 to 6 possible outcomes and the second die
to also 1 to 6 possible outcomes, then the operation of tossing a pair of dice will give rise to
6 × 6 = 36 possible outcomes.
(c) Suppose that a set A has n1 elements and a second set B has n2 elements. Then the Cartesian
product A × B has n1 × n2 elements. The Cartesian product A × A has n21 elements, B × B has
n22 elements.
This definition applies to any number of operations. For instance, tossing three coins can give
rise to 23 = 8 possible outcomes. Let set Ai have ni elements, i = 1, 2, ..., k, then the Cartesian
product A1 × A2 × ... × Ak has n1 × n2 × ... × nk different elements. The tree diagram is often
used to graphically express the multiplication rule (check MAT100 notes).
3.1.2 Permutation
18
STA211: Foundations of Probability and Statistics 18
3.1 Session 8: Review
3 ofUNIT
Counting
3: CONDITIONAL
techniques PROBABILITY AND INDEPENDENCE
of these objects in a given order is called an r permutation of the n symbols taken r at a time,
and it is denoted by P (n, r) or nPr .
For P (n, n), the total number of ways of performing n operations is given by the product of
number of ways of doing individual operations in the order of their performance, with first being
done in n ways, the second in n − 1 ways, up to the last in 1 way. Hence, total number of
operations is n × (n − 1) × (n − 2) × ... × 2 × 1, also written as n! (read as n-factorial). Thus
P (n, n) = n × (n − 1) × (n − 2) × ... × 2 × 1 = n!.
With P (n, r), the first operation can also be performed in n ways, the second in (n − 1) ways,
so that by the time an r-th operation is undertaken (r − 1) symbols have already be used
and any of the remaining n − (r − 1) symbols could be candidate for r-th position. Hence,
P (n, r) = n × (n − 1) × (n − 2) × ... × (n − (r − 1)).
Since the very last operation is r-th operation, then multiplying P (n, r) with (n−r)! gives P (n, n).
(n−r)! n!
Hence, by algebraic trick, we have P (n, r) = n×(n−1)×(n−2)×...×(n−(r−1))× (n−r)! = (n−r)! .
Examples of Permutation
(a) Suppose that the same 5 people park their cars on the same side of the street in the same block
every night. How many different orderings of the 5 cars parked on the street are possible? (b)
How many four-letter ”words” can we make using the letters w,i,n,t,e,r (allowing no repetition)?
Solution: (a) The first parking slot can be filled in 5 different cars, the second by 4 different
cars, an so on. Hence, we have 5.4.3.2.1 = 5! = 120 total numder of different orderings without
repeating an ordering.
(b) Assume any four of the given letters form a ”word”, not necessarily a dictionary word, then
6!
we have P (6, 4) = 6 × 5 × 4 × 3 = 360 or P (6, 4) = (6−4)! = 6×5×4×3×2!
2!
= 360.
The definition and examples of permutation we have had involve scenarios in which symbols
being arranged are not repeated at subsequent stages. But permutation may also apply to cases
where at each stage of operation, there is repetition of symbols to use.
Example
(a) Find the number of seven-letter words that can be formed using the letters of the word
”BENZENE”. (b) Find the number of different signals, each consisting of eight flags in a vertical
line, that can be formed from four identical red flags, three identical white flags, and a blue flag.
c How may three-letter words can we make using letters w,i,n,t,e,r, with one or more repeated
letters.
Solution: (a) This the case of P (7; 3, 2), since there are 7 letters, of which 3 are E’s and 2 are
7!
N’s. Hence P (7; 3, 2) = 3!2! = 7.6.5.4.3.2.1
3.2.1.2.1
= 420. (b) This is the case of P (8; 4, 3), since the signal
8!
is to be made from 8 flags where 4 are red and 3 are white. Hence P (8; 4, 3) = 4!3! = 280. (c)
19
STA211: Foundations of Probability and Statistics 19
3.2 Session 9: Ordered
3 samples,
UNIT 3: Combinations
CONDITIONAL PROBABILITY AND INDEPENDENCE
There will be 6 × 6 × 6 = 216 three we can make from the given 6 letters if repetition is allowed.
Now, if repetition is not allowed, there will be P (6, 3) = 6.5.4 = 120 three-letter words we can
make. Now, we will have 216 − 120 = 96 three-letter words with one or more repeated letters.
Session 8 Exercises
1. 6 people are about to enter a cave in a single file. In how many ways could they arrange
themselves in a row to go through the entrance?
2. A statistics class contains 8 male and 6 female students. Find the number of ways that the
class can elect: (a) 2 class representatives, 1 male and 1 female; (b) 1 class representative
and 1 vice-class representative
3. There are 5 bus lines from city A to B and 4 bus lines from city B to C. Find the number
of ways a person can travel: (a) from A to C by way of B; (b) round-trip from A to C by
way of B; (c) round-trip from A to C by way of B, without using a bus line more than
once
4. Suppose there are 12 married couples at a party. Find the number of ways of choosing a
man and a woman from the party such that the two are: (a) married to each other; (b) not
married to each other
5. Suppose a password consists of 4 characters, the first 2 being letters in the alphabet and
the last 2 being digits. Find the number of: (a) passwords that can be generated; (b)
passwords beginning with a vowel
Introduction
In this session, further discussion of counting techniques is given.
Objectives of Session 9
After studying this session, you should be able to:
Use combinations to solve problems related to number of elements in finite sample space
The cases of permutation without and with repetitions discussed in previous session are analogous
to studying number of ways of selecting an element from a set S containing n elements, one
after another until the required number or type is exhausted, without replacement and with
replacement of preceding chosen elements.
This is called sampling without replacement if the element is not replaced in the set S before
the next is chosen, and sampling with replacement if the element is replaced in S before the
20
STA211: Foundations of Probability and Statistics 20
3.2 Session 9: Ordered
3 samples,
UNIT 3: Combinations
CONDITIONAL PROBABILITY AND INDEPENDENCE
n!
next element is chosen. Therefore, we use the same permutation formulas P (n, r) = (n−r)! to
get different ordered samples of size r from a set with n elements, without replacement. While
n × n × n × ... × n = nr is the total number of different ordered samples of size r that can be
selected from a set with n elements, with replacement.
Examples
(a) Three cards are chosen in succession from a deck of 52 cards. Find the number of ways this
can be done (i) with replacement, (ii) without replacement. (b) An urn contains 8 balls. In how
many ways can you choose 3 balls from the urn (i) with replacement (ii) without replacement.
Solution: (a − i) The 3 cards can be selected in a total of 52 × 52 × 52 = 523 = 140, 608 different
ways, with replacement; (a − ii) The 3 cards can be chosen in P (52, 3) = 52 × 51 × 50 = 132, 600
different ways, without replacement. (b − i) The 3 balls can be chosen in 8 × 8 × 8 = 83 = 512
different ways, with replacement; (b − ii) the 3 balls can be chosen in P (8, 3) = 8 × 7 × 6 = 336
different ways, without replacement.
3.2.2 Combinations
Definition: The number of distinct subsets, each of size r, that can be constructed from a set
S with n elements
is called the number of combinations of n things taken r at a time, and it
n
is denoted by or C(n, r). The key thing is that order of arranging objects doesn’t count
r
in combinations, unlike in permutations, that’s r different symbols can be used to construct r!
different r-tuples. Hence, C(n, r) = P (n,r)
r!
n!
= r!(n−r)! .
Examples of combinations
(a) Let S = {1, 2, 3, 4}. Find the number of combinations of elements of S taken two at a time.
(b) Find the number of committees of 3 people that can be formed from 8 people. (c) A farmer
buys 3 cows, 2 pigs, and 4 hens from a person who has 6 cows, 5 pigs, and 8 hens. How many
choices does the farmer has?
Solution: (a) The actual combinations are: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, Hence,
4!
in total we have C(4, 2) = 2!(4−2)! = 6. (b) Each committee is a combination of 8 people taken
3 at a time. Therefore, we have C(8, 3) = 8.7.6
3.2.1
=56 committees. (c) The cows can be chosen in
C(6, 3) = 3.2.1 = 20 ways, the pigs in C(5, 2) = 5.4
6.5.4
2.1
= 10 ways, and hens in C(8, 4) = 8.7.6.5
4.3.2.1
= 70
ways. In total, the choices can be made in C(6, 3) × C(5, 2) × C(8, 4) = 20.10.70 = 14, 000 ways.
Recall the definition of partition of a set in Unit 1. Let an urn A contains 7 marbles that are
numbered 1 through 7. If the interest is to compute the number of ways we can draw first 2
marbles, next 3 marbles, and last 2 marbles from the urn, then we want the number of ordered
partitions of A into the celss A1 , A2 , A3 , where A1 contains 2 marbles, A2 3 marbles and A3 2
marbles.
7 5
Hence, there are ways of drawing the first 2 marbles, i.e. determining cell one;
2 3
21
STA211: Foundations of Probability and Statistics 21
3.2 Session 9: Ordered
3 samples,
UNIT 3: Combinations
CONDITIONAL PROBABILITY AND INDEPENDENCE
2
ways of drawing next 3 marbles; and ways of drawing the last 2 marbles. In total, we have
2
7 5 2
. . = 210 different ordered partitions of A into cells A1 , A2 , A3 . In short, n1 objects
2 3 2
n n − n1
can be allocated in ways in cell A1 , n2 objects in ways into cell A2 etc and last
n1 n2
n − n1 − n2 − ... − nr−1
nr objects can be allocated into cell Ar in different ways.
nr
This
gives the total
number
of permutations
as
n n n − n1 n − n1 − n2 − ... − nr−1
= . ... = n1 !n2n!!...nr ! = Qr n! ni ! .
n1 n2 ...nr n1 n2 nr i=1
n
In statistics, the quantity is called a binomial coefficient as it ralates to expansion of
r
n
binary functions, while is called multinomial coefficient. You will encounter problems
n1 n2 ...nr
relating to binomial and multinomial functions later in this course.
Example
There are 12 balls in an urn. In how many ways can 3 balls be drawn from the urn 4 times in
succession all without replacement.
Solution:
Let’s partition the urn into 4 cells, each containing 3 balls. Then number of ways is
12 12!
= 3!3!3!3! = 369, 600 different ways.
3333
A tree diagram is a device used to enumerate all possible outcomes of a sequence of experiments
or events, where each event can occur in a finite number of ways.
Examples
John (J) and Grey (G) are to play a tennis tournament. The first person to win 2 games in a row
or who wins a total of 3 games wins the torunament. Find the number of ways the tournament
can occur.
Solution: The tree diagram for possibles outcomes is given below. There are 10 endpoints which
correspond to the following 10 ways that the tournament can occur: JJ, JGJJ, JGJGJ, JGJGG,
JGG, GJJ, GJGJJ, GJGJG, GJGG, GG
22
STA211: Foundations of Probability and Statistics 22
3.3 Session 10: Conditional
3 UNIT probability
3: CONDITIONAL PROBABILITY AND INDEPENDENCE
G J
G J G J
G J G J
G J G J
G J G J
Session 9 Exercises
1. Three light bulbs are chosen at random from 15 bulbs of which 5 are defective. Find number
of ways of selecting the bulbs such that (i) none is defective, (ii) exactly one is defective
2. A family has 3 boys and 2 girls. (a) Find the number of ways they can sit in a row. (b)
How many ways are there if the boys and girls are to sit together?
3. A student is to answer 8 out of 10 questions in an exam. (a) Find the number of ways the
student can choose the 8 questions. (b) Find the number of ways this can happen, if the
student must answer the first three questions
4. Find the number of committees of 5 with a given chairperson that can be selected from 12
persons
5. Jane has time to play betting game at most 5 times. At each play she wins or loses K1000.
She begins with K1000 and will stop playing before 5 plays if she loses all her money. (a)
Find the number of ways the betting can occur. (b) How many cases will she stop before
playing 5 times. (c) How many cases will she leave without any money?
Introduction
In this session, relative probability of an event occuring conditional on another event is discussed.
Objectives of Session 10
After studying this session, you should be able to:
Let A and B be any two events on some sample space S. If event B has occured, it must have
occured either with event A, i.e. A ∩ B has occured or it must have occured without event A
23
STA211: Foundations of Probability and Statistics 23
3.3 Session 10: Conditional
3 UNIT probability
3: CONDITIONAL PROBABILITY AND INDEPENDENCE
Definition
Suppose A and B be any two events on some sample space S, with P (B) > 0. The probability
that event A occurs once B has occured or, the conditional probability of A given B, written
P (A|B) is
P (A ∩ B)
P (A|B) = (1)
P (B)
Solution:
(a) Let A be the event that two numbers that occur are different, i.e.
A = {(1, 2), (2, 1), (1, 3), (3, 1), (1, 4), (4, 1), (1, 5), (5, 1), (1, 6), (6, 1), (2, 3), (3, 2), (2, 4), (4, 2), (2, 5), (5, 2),
(2, 6), (6, 2), (3, 4), (4, 3), (3, 5), (5, 3), (3, 6), (6, 3), (4, 5), (5, 4), (4, 6), (6, 4), (5, 6), (6, 5)}
While B be the event that the sum is 7, i.e. B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}. Here,
S = {(1, 1), (1, 2), (2, 1), ..., (6, 6)}, n(S) = 36 possible outcomes. Assume equally likely single-
element events, each with probability 1/36. Then P (A) = n(A) n(S)
= 30
36
= 56 , P (B) = n(B)
n(S)
6
= 36 = 16 ,
n(A∩B) 6 P (A∩B) 1/6
P (A ∩ B) = n(S)
= 36
= 61 . Hence, P (B|A) = P (A)
= 5/6
= 15 .
(b) Denote B=Boy, G=Girl child and assume equiprobable space S = {BB, BG, GB, GG}, each
with probability 1/4. Let A = {BB, BG, GB} be event that at least one of the children is a boy,
D = {BB} be event that both children are boys, C = {BB, BG} be event that older child is a
boy. (i) P (D|A) = P P(A∩D)
(A)
= 1/4
3/4
= 31 ; (ii) P (D|C) = P P(D∩C)
(C)
1/4
= 2/4 = 12 .
Conditional probability can help us assign probabilities to intersection of events. Since P (A|B) =
P (A∩B)
P (B)
and that A ∩ B = B ∩ A, it must be true that
The above result is called multiplication theorem for conditional probability, which gives a formula
for computing probability that events A and B both occur.
24
STA211: Foundations of Probability and Statistics 24
3.3 Session 10: Conditional
3 UNIT probability
3: CONDITIONAL PROBABILITY AND INDEPENDENCE
Using the fact that P (B ∩ C) = P (B|C)P (C) and commutative law for intersection, the above
result simplifies to
P (A∩B∩C) = P (C)P (B|C)P (A|B∩C) = P (A)P (B|A)P (C|A∩B) = P (B)P (A|B)P (C|A∩B).
(4)
Example
An urn contains 4 white balls and 8 black balls. If 2 balls are selected at random without
replacement, find the probability that (a) both are white balls, (b) the second ball is white.
Solution
(a) Let A be the event that first ball drawn is white and B the event that second ball is white,
4 3 1
and A ∩ B the event that both are white. Then P (A ∩ B) = P (A)P (B|A) = 12 . 11 = 11
(b)
B = (A ∩ B) ∪ (A{ ∩ B)
P (B) = P (A ∩ B) + P (A{ ∩ B)
= P (A)P (B|A) + P (A{ )P (B|A{ )
(5)
4 3 8 4
= . + .
12 11 12 11
1
= .
3
Session 10 Exercises
1. A fair coin is flipped 4 times. What is the probability that the fourth flip is a head, given
that each of the 3 flips resulted in heads?
2. Urn 1 contains 2 red and 4 blue balls, urn 2 contains 10 red and 2 blue balls. If an urn is
chosen at random and a ball is removed from the chosen urn, what is the probability that
the selected ball is blue?
3. A lot contains 12 items of which 4 are defective. Three items are drawn at random from
the lot one after the other. Find the probability that all 3 are nondefective.
4. In a certain college, 25% of the students failed mathematics, 15% failed chemistry, and
10% failed both mathematics and chemistry. A student is selected at random. (a) If the
student failed chemistry, what is the probability that he or she failed mathematics? (b)
What is the probability that the student failed mathematics or chemistry? (c) What is the
probability that the student failed neither mathematics nor chemistry?
5. Let A and B be events with P (A) = 0.6, P (B) = 0.3, and P (A ∩ B) = 0.2. Find (a)
P (A|B) and P (B|A), (b) P (A ∪ B), (c) P (A{ ), (d) P (A{ |B { ).
6. A box contains 7 red marbles and 3 white marbles. Three marbles are drawn from the box
one after the other. Find the probability that the first 2 are red and the third is white.
7. Students in a class are selected at random, one after the other, for an examination. Find
the probability that the men and women in the class alternate if: (a) the class consists of
4 men and 3 women, (b) the class consists of 3 men and 3 women
25
STA211: Foundations of Probability and Statistics 25
3.4 Session 11: Independent
3 UNITevents
3: CONDITIONAL PROBABILITY AND INDEPENDENCE
8. A box contains 3 red marbles and 7 white marbles. A marble is drawn from the box and
the marble is replaced by a marble of the other colour. A second marble is drawn from the
box. (a) Find the probability that the second marble is red. (b) If both marbles were of
the same colour, find the probability that they both were white.
Introduction
In this session, the multiplication theorem for conditional probability is applied to define proba-
bility of independent events.
Objectives of Session 11
After studying this session, you should be able to:
Events A and B in a probability space S are said to be independent if the occurence of one of
them does not influence the occurence of the other. In other words, A is independent of B if
P (A) is the same as P (A|B). Now, substititute P (B) for P (B|A) in the multiplication theorem
for P (A ∩ B), we have
P (A ∩ B) = P (A)P (B|A) = P (A)P (B) (6)
Definition
Two events A and B are said to be independent of each other iff the following three conditions
hold:
P (A|B) = P (A)
P (B|A) = P (B), and (7)
P (A ∩ B) = P (A)P (B),
otherwise the two events are said to be dependent.
Note: Independence of events do not imply disjoint (mutually exclusive) events or viceversa,
unless one of the events is a null event.
Solution
Let A be the event that the man lives 10 more years, and B the wife lives 10 more years.
Therefore, P (A) = 41 ; P (B) = 25 . Then (a) P (A ∩ B) = P (A).P (B) = 14 . 25 = 10
1
; and (b)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 14 + 25 − 10
1
= 11
20
.
26
STA211: Foundations of Probability and Statistics 26
3.4 Session 11: Independent
3 UNITevents
3: CONDITIONAL PROBABILITY AND INDEPENDENCE
Definition
Three events A, B, and C are independent if and only if:
The above definition means that three events are said to be independent on when they both
pairwise independent and jointly independent. This defition can be extended by mathematical
inducation to any finite number of events, i.e. events A1 , A2 , ..., An are independent if any proper
subset of them is independent and P (A1 ∩ A2 ∩ ... ∩ An ) = P (A1 )P (A2 )...P (An ).
Example
A pair of coins is tossed yielding equiprobable space S = {HH, HT, T H, T T }. Consider the
events: A = {head on f irst toss} = {HH, HT }, B = {head on second toss} = {HH, T H},
and C = {head on exactly one coin} = {HT, T H}. Then P (A) = P (B) = P (C) = 24 = 12 .
Also, P (A ∩ B) = P ({HH}) = 14 , P (A ∩ C) = P ({HT }) = 14 , P (B ∩ C) = P ({T H}) = 14 .
This means that conidtions 1 to 3 are satisfied. Now, A ∩ B ∩ C = ∅, so that P (A ∩ B ∩ C) =
P (∅) = 0 6= P (A)P (B)P (C). Thus, condition 4 is not satisfied, and hence the three events are
not independent.
Definition
Let S be a finite probability space. The probability space of n independent or repeated trials,
denoted by Sn , consists of ordered n-tuples of elements of S with probability of an n-tuple defined
to be the product of the probability of its components, i.e. P (s1 , s2 , ..., sn ) = P (s1 )P (s2 )...P (sn ).
Example
Suppose that three horses a, b, c race together, their respective probabilities of winning are 1/2,
1/3, and 1/6. Suppose the horses race twice. Find the probability that horse c wins first race
and a wins the second race.
Solution
The first trial has S = {a, b, c}, with P (a) = 21 , P (b) = 31 , and P (c) = 16 . Upon repeating the trial,
we have S2 = {(a, a), (a, b), (a, c), (b, a), (b, b), (b, c), (c, a), (c, b), (c, c)}. Therefore P ((c, a)) =
1 1 1
. = 12
6 2
.
Session 11 Exercises
1. Let A and b be independent events with P (A) = 0.3 and P (B) = 0.4. Find (a) P (A ∩ B)
and P (A ∪ B), (b) P (A|B) and P (B|A), (c) P (A ∩ B { )
2. Box A contains 5 red marbles and 3 blue marbles and Box B contains 2 red marbles and
3 blue. Two marbles are drawn at random from each box. Find the probability that (a)
both are red, (b) they are all the same colour
27
STA211: Foundations of Probability and Statistics 27
3.5 Session 12: Partitions,
3 UNIT Total
3: Probability,
CONDITIONAL and Bayes’
PROBABILITY
Theorem AND INDEPENDENCE
3. Suppose A and B are independent events. Show that (a) A and A and B { are independent,
(b) A{ and B { are independent
4. Suppose the probability that Karonga United wins a home game in TNM Super League is
0.5, the probability that it loses at home is 0.3, while the probability that it draws at home
is 0.2. The team plays twice at home. Find the probability that it wins at least once.
Introduction
In this session, learners will be introduced to the law of total probability and Bayes’ theorem.
Objectives of Session 12
After studying this session, you should be able to:
The events E1 , E2 , ..., En are called a partition of the sample space S if Ei ∩ Ej = ∅ for all i 6= j
and E1 ∪E2 ∪...∪En = S. In other words, a partition cuts the whole samples space into mutually
exclusive pieces.
n
X
P (A) = P (A ∩ E1 ) + (A ∩ E2 ) + ... + (A ∩ En ) = P (A ∩ Ei ). (9)
i=1
Using multiplication theorem for conditional probability, the above theorem is simplified as
n
X
P (A) = P (E1 )P (A|E1 ) + P (E2 )(A|E2 ) + ... + P (En )(A|En ) = P (Ei )(A|Ei ). (10)
i=1
28
STA211: Foundations of Probability and Statistics 28
3.5 Session 12: Partitions,
3 UNIT Total
3: Probability,
CONDITIONAL and Bayes’
PROBABILITY
Theorem AND INDEPENDENCE
Example
A factory uses three machines X, Y , Z to produce lighting bulbs. Suppose machine X produces
50% of the bulbs, of which 3% are defective, machine Y produces 30% of the bulbs of which 4%
are defective, and machine Z produces 20% of the bulbs of which 5% are defective. Find the
probability that a randomly selected bulb is defective.
Solution
Let D denotes the event that a bulb is defective. Then, by the law of total probability,
P (D) = P (X)P (D|X) + P (Y )P (D|Y ) + P (Z)P (D|Z) = 0.50 × 0.03 + 0.30 × 0.04 + 0.20 × 0.05 =
0.037.
P (Ei )P (A|Ei )
P (Ei |A) = Pn , i = 1, 2, ..., n. (11)
j=1 P (Ej )P (A|Ej )
Proof
i ∩A)
By definition P (Ei |A) = P (E
P (A)
. Since P (Ei ∩ A) = P (Ei )P (A|Ei ) and also by the law of total
probability we have P (A) = j=1 P (Ej ∩ A) = nj=1 P (Ej )P (A|Ej ), then the result (11) follows
Pn P
immediately.
Example
Using the previous example, find the probability that a given defective bulb found was produced
by machine Y.
Solution
P (Y )P (D|Y ) 0.30×0.04 10
We want P (Y |D). Using Bayes’ rule, we have P (Y |D) = P (D)
= 0.037
= 37
.
Session 12 Exercises
1. Suppose 40% of residents of Mzuzu consider themselves as MCP supporters, 35% consider
themselves as UTM supporters, and 25% consider themselves as DPP supporters. During
the 2020 presidential election, 45% of MCP supporters voted, 40% of UTM supporters
voted, and 60% of DPP supporters voted. Suppose a person is randomly selected, (a) find
the probability that the person voted; (b) if the person voted, find the probability that the
voter was UTM supporter.
2. In a certain college, 4% of the men and 1% of the women are taller than 6 feet. Furthermore,
60% of the students are women. Suppose a randomly selected student is taller than 6 feet,
find the probaility that the student is a woman.
3. Suppose that medical science has a cancer-diagnostic test that is 95% accurate on both
those who do and those who do not have the cancer. If 0.005 of the population does have
cancer, compute the probability that a particular individual has cancer, given that the test
says he has cancer.
29
STA211: Foundations of Probability and Statistics 29
3.6 Unit 3 Summary3 UNIT 3: CONDITIONAL PROBABILITY AND INDEPENDENCE
2. Larson, H.J. (1982). Introduction to probability theory and statistical inference, 3rd ed.
New York: John Wiley and Sons.
3.
4. Panik, M.J. (2005). Advanced statistics from an elementary point of view. Amsterdam:
Elsevier.
30
STA211: Foundations of Probability and Statistics 30
4 UNIT 4: RANDOM VARIABLES AND THEIR PROPERTIES
This unit presents the concept of random variable and introduces learners to probability distri-
bution of a random variable.
Introduction
In this session, the concept of random variable is defined. Further, the concept of probability of
a random variable is presented.
Objectives of Session 13
After studying this session, you should be able to:
Definition
Let S be a sample space. A random variable X is a function whose domain is an event on S and
whose range is a real number.
Recall that an outcome of a random experiment, that generates members of S, cannot be pre-
dicted in advance but depends on chance factors (or it is random). So, the value of X, which
we denote using small letter x, varies from trial to trial. This is the reason X is referred to as
random variable.
Solution
Here S = {(x1 , x2 ) : x1 = 1, 2, ..., 6; x2 = 1, 2, ..., 6}, event ω = (x1 , x2 ) ∈ S. Hence random
variable X(ω) = x1 + x2 . This gives the range of X as RX = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
(b) A couple wants to stop bearing children after four live births. Define a random variable X
as total number of female children this couple may have. Specify the range of values of X.
Solution
The sample space S will have 16 equally likely events, i.e. S = {BBBB, BBBG, BGGB, ..., BGGG, GGGG
where B=Boy and G=Girl. Hence, X = {0, 1, 2, 3, 4}.
31
STA211: Foundations of Probability and Statistics 31
4.1 Session 13: Random variable
4 UNIT and
4: its
RANDOM
probability
VARIABLES
distributionAND THEIR PROPERTIES
A random variables can be discrete or continuous, depending on its range. The distinction
between the two types of random variables is analogous to the cases of discrete and continuous
sample spaces. We will first deal with discrete random variables and later work with continuous
random variables.
Definition
A random variable X is called discrete if its range, RX , is a discrete set. Otherwise X is said to
be continuous.
In other words, X is discrete if the number of values it can assume forms a countable set. On the
other hand, X is continuous if it can assume an infinite or uncountable number of values over
some interval. The two examples we have had above are both cases of discrete random variable.
Here are some properties of random variables. If X and Y are random variables on the same
sample space S and ω any point in S, then X + Y , X + k, kX, and XY are the functions on S
defined as follows:
2. (X + k)(ω) = X(ω) + k
3. (kX)(ω) = kX(ω)
Suppose X is a finite random variable on a sample space S, with an image set X(ω) = {x1 , x2 , ..., xn }.
X(ω) can be made into a probability space by defining the probabilities of xi as P (X = xi ), which
is sometimes written as f (xi ). The function f on X(ω) is called the probability function or dis-
tribution of X.
The set of ordered pairs [xi , f (xi )] can be given in a form of a table as follows:
xi x1 x2 ... xn
f (xi ) f (x1 ) f (x2 ) ... f (xn )
The probability distribution f of the random variable X satisfies the following two conditions:
1. f (xi ) ≥ 0
Pn
2. i=1 f (xi ) = 1
32
STA211: Foundations of Probability and Statistics 32
4.2 Session 14: Distribution
4 functions
UNIT 4: RANDOM
and densityVARIABLES
functions AND THEIR PROPERTIES
Solution
Sample space is finite with 36 ordered pairs that are equally likely to occur, i.e. S = {(1, 1), (1, 2), (1, 3), (1, 4
Now, X[(x1 , x2 )] = x1 + x2 , (x1 , x2 ) ∈ S. The range of X is RX = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
1
Only one event in S, gives the value of X as 2, i.e. (1, 1), hence P (X = 2) = 36 . The value 3
2
can be obtained from 2 events, i.e. (1, 2) and (2, 1), hence P (X = 3) = 36 . Continuing this way
we have the distribution of X as follows:
x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36
From the given distribution in Table 2, we note that for each x ∈ X, P (x) ≥ 0 and
1 2 3 4 5 6 5 4 3 2 1
P
P (x) = 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 = 1. Hence, the two conditions for
a probability distribution are satisfied, implying that the distribution of X forms a probability
function.
Session 13 Exercises
1. An urn contains 4 balls numbered 1, 2, 3, 4, respectively. Two balls are drawn from the
urn at random without replacement. Let Z be the sum of the two numbers that occur.
Derive the probability function for Z.
2. Two fair dice are rolled one time and let M be the maximum of the two numbers that face
up. Derive the probability function for M .
3. A class in statistics contains 10 students, 3 of whom are 19, 4 are 20, 1 is 21, 1 is 24, and
1 is 26. Two students are selected at random without replacement from this class. Let X
be the average age of the two selected students and drive the probability function for X.
4. A fair coin is tossed 4 times. Let X denote the number of heads occuring. Derive the
probability distribution of X.
Introduction
This session introduces learners to probability distribution function and density function of a
random variable.
Objectives of Session 14
After studying this session, you should be able to:
33
STA211: Foundations of Probability and Statistics 33
4.2 Session 14: Distribution
4 functions
UNIT 4: RANDOM
and densityVARIABLES
functions AND THEIR PROPERTIES
Definition: The distribution function (or cumulative distribution function, cdf) for a random
variable X, denoted by FX (t) gives the value of P (X ≤ t) for any real t, i.e. FX (t) = P (X ≤ t)
for −∞ < t < ∞.
Definition: Let X be a continuous random variable with distribution function FX (t). The
density function (or probability density function, pdf) for X is fX (t) = dtd FX (t).
Rt
If X is a continuous random variable, it can be shown that P (X ≤ t) = FX (t) = −∞ fX (t)dt
Rb
and P (a ≤ X ≤ b) = FX (b) − FX (a) = a fX (t)dt. In other words, the cdf between any two
values a and b of a continuous random variable X is the area under fX (t) between a and b.
Solution:
The sample sapce, S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }. The random vari-
able X has values, X = {0, 1, 2, 3}, with probability distribution:
x 0 1 2 3 sum
1 3 3 1
P (X = x) 8 8 8 8
1
(b) Given that Y is a continuous random variable with fY (y) = 2(1 − y) for 0 < y < 1, and 0
otherwise. Verify that Y has a probability distribution function. Compute P (Y > 0.6).
SolutionR
∞ R1
FY (y) = −∞ fY (y)dy = 0 2(1 − y)dy = (2y − y 2 )|10 = 1. Thus, 0 ≤ FY (y) ≤ 1. Hence, Y has a
34
STA211: Foundations of Probability and Statistics 34
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES
R 0.6
probability distribution. Now P (Y > 0.6) = 1−FY (0.6) = 1− 0
2(1−y)dy = 1−(2y −y 2 )|0.6
0 =
0.16.
Note: The probability distribution of a discrete random is also called probability mass function
because it shows the concentration (or mass or weighting) of each point in the range of the
random variable.
Session 14 Exercises
1. Let the random variable Y have probability mass function as in Table 4: Find P (Y < 8).
y -2 5 8
g(y) 0.3 0.5 0.2
2. Let X be a continuous random variable with the probability density function f (x) = kx if
0 ≤ x ≤ 5 and 0 elsewhere. Find (a) k (b)P (1 ≤ X ≤ 3), (c)P (X > 2).
Introduction
This session introduces learners to expected value and variance of a random variable.
Objectives of Session 15
After studying this session, you should be able to:
Knowing a probability law of a random variable and computing its associated probabilities alone
may not be all that the analyst wants. In some cases, one may wish to know the centre of the
distribution of the random variable, the spread of values from this centre, or even shape of the
distribution. This calls for further analysis of the probability distribution of the random variable.
Definition: Let X be a random variable, whose probability mass function is p(x) or probability
density function is f (x), depending on whether X is discrete or continuous, respectively. Then,
the mean, or expectation (or expected value) of X, denoted by E(X) is:
n
X
E(X) = x1 p(x1 ) + x2 p(x2 ) + ... + xn p(xn ) = xi p(xi ), [X discrete] (12)
i=1
35
STA211: Foundations of Probability and Statistics 35
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES
or Z ∞
E(X) = xf (x)dx, [X continuous] (13)
−∞
Note that E(X) can be viewed as the weighted average of the values of X, where each value is
weighted by its probability, if X is discrete, or as the balance point of the density function, if X
is continuous.
y 2 3 6 10
g(y) 0.2 0.2 0.5 0.1
(b) Let Y be a continuous random variable with probability density function f (y) = 2(1 − y) for
0 < y < 1 and 0 otherwise. Find the expected value of Y .
Solution P
(a) E(X) = x xp(x) = 2(0.2) + 3(0.2) + 6(0.5) + 10(0.1) = 0.4 + 0.6 + 3.0 + 1.0 = 5.
R∞ R1 R1
(b) E(Y ) = −∞ yf (y)dy = 0 y × 2(1 − y)dy = 0 (2y − 2y 2 )dy = (y 2 − 32 y 3 )|10 = 31 .
1. E(c) = c
2. E(cX) = cE(X)
3. E(X + c) = E(X) + c
Proof
1) If X P
is discrete random variable,
E(c) = ni=1 c × p(xi ) = c × p(x1 ) + c × p(x2 ) + ... + c × p(xn ) = c × [p(x1 ) + p(x2 ) + ... + p(xn )] =
c × 1 = c. R R
If X is continuous, then E(c) = x cf (x)dx = c x f (x)dx = c × 1 = c.
36
STA211: Foundations of Probability and Statistics 36
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES
or Z ∞
2
V ar(X) = E[(X − µ) ] = (x − µ)2 f (x)dx, [X continuous]. (16)
−∞
If we expanded the squaring in the above definition, we will get an alternative formula for
calculating variance of X as follows:
or
37
STA211: Foundations of Probability and Statistics 37
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES
Examples of variance Pn 2 2
Using the data from the previous example, we have: (a) V ar(X) = i=1 xi p(xi ) − µ =
22 (0.2) + 32 (0.2) + 62 (0.5) + 102 (0.1) − 52
= 4(0.2) + 9(0.2) + 36(0.5) + 100(0.1) − 25 = 30.6 − 25 = 5.6.
R∞ R1 R1
(b) V ar(Y ) = −∞ y 2 f (y)dx − µ2 = 0 y 2 × 2(1 − y)dy − ( 31 )2 = 0 (2y 2 − 2y 3 )dy − 1
9
3 4
= [ 2y3 − y2 ]|10 − 19 = 0.56.
Definition: The standard deviation ofpa random variable X, denoted by σ is the positive square
root of the variance of X, i.e. σ = V ar(X). The standard deviation of X measures the
dispersion of values of X around the mean µ.
It is often practically meaningful to analyse the spread of the distribution of X around the
mean using
√ standard deviation compared to variance. For instance, in example (a) above,
σX = 5.6 = 2.37, which is interpreted as values of X√are, on average, at a distance of 2.37
away from the mean of X. While for example (b), σY = 0.56 = 0.75, implying values of Y are
on average at a distance of 0.24 away from the mean of Y .
Properties of variance
2. V ar(cX) = c2 V ar(X)
4. if E(X 2 ) exists, E(X) exists and thus V ar(X) exists. Hence, the existence of V ar(X)
implies that E(X) exists.
Session 15 Exercises
1. Let the probability distribution of a random variable X be given by {(1, 41 ), (3, 12 ), (9, 14 )}.
Find (a) E(X 2 − 1), (b) Var(X)
38
STA211: Foundations of Probability and Statistics 38
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES
2. Suppose Y has the pdf f (y) = 1/5 for 5 < y < 10 and 0 elsewhere. Find (a) E(X), (b)
E(X 2 ), (c) V ar(X)
3. Prove that, for any random variable X and real numbers a,b, V ar(a + bX) = b2 V ar(X).
n
Pn n n−i i
4. Evaluate the expression E[(aX + b) ] = i=0 a b E(X n−i ) for n=1,2,3
i
5. Two cards are selected from a box that contains five cards numbered 1, 1, 2, 2, 3. Let Y
denotes the sum of the two numbers drawn. Find: (i) the probability distribution of Y ;
(ii) mean of Y , E(Y ); (iii) the variance of Y , V ar(Y ) and standard deviation of Y , σY .
Introduction
In this session, bivariate probability distribution is discussed.
Objectives of Session 16
After studying this session, you should be able to:
Definition: Suppose (X, Y ) is a pair of real-valued functions defined on a sample space S. The
pair (X, Y ) is a bivariate random variable if both X and Y map elements in S into real numbers.
Definition: Let X and Y be random variables defined on the same sample space S. Define
event A = {(X, Y ) : a ≤ X ≤ b, c ≤ Y ≤ d} ⊂ S. Then, the pair (X, Y ) is a bivariate continuous
random variable.
39
STA211: Foundations of Probability and Statistics 39
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES
Y
0 1
1 0.25 0.25
X 2 0.25 0.25
(b) Let X and Y have the bivariate probability function f (X, Y ) = X+Y36
for X = 1, 2, 3 and
Y = 1, 2, 3, and 0 otherwise. Find P (X = 2, Y ≤ 2)
(c) Let X and Y have the distribution f (x, y) = 3x(1 − xy), for 0 < x, y < 1 and 0 elsewhere.
Find P (X ≤ 21 , Y ≤ 21 ).
Solution
(a) P (X = 1, Y ≤ 1) = P (X = 1, Y = 0) + P (X = 1, Y = 1) = 0.25 + 0.25 = 0.50.
(b) P (X = 2, Y ≤ 2) = P (X = 2, Y = 1) + P (X = 2, Y = 2) = 2+1 36
+ 2+2
36
7
= 36 .
1 R 1 1 2
(c) P (X ≤ 12 , Y ≤ 12 ) = 02 02 3x(1 − xy)dydx = 02 ( 3x − 3x8 )dx = 16
3 1
= 11
R R
2
− 64 64
.
Definition: Given the random variables X and Y with bivariate probability mass function
P (X = x, Y = y) or density density function f (x, y), the
P marginal probability mass function
(respectively
P density function) of X is given by: p(x) = y P (X = x, Y = y), with p(x) ≥ 0
and xRp(x) = 1 [X,Y discrete]; and
∞ R∞
g(x) = −∞ f (x, y)dy, with g(x) ≥ 0 and −∞ g(x)dx = 1 [X, Y continuous].
Definition:
P The marginal cumulative distribution function of X is given by:
F (b) = bx=x0 P (X = x, Y = y), [X, discrete] and
Rb R∞
F (b) = −∞ ∞ f (x, y)dydx, [X, continuous]
40
STA211: Foundations of Probability and Statistics 40
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES
and X
P (X = 2) = f (X, Y ) = f (2, 0) + f (2, 1) = 0.25 + 0.25 = 0.50. (20)
y
This gives us the marginal probability mass function of X and Y, respectively as:
x 1 2 sum
p(x) 0.50 0.50 1
and
y 0 1 sum
g(y) 0.50 0.50 1
While from the case (c) for continuous bivariate random variables, the marginal probability
density function of X is:
Z ∞ Z 1 Z 1
3 3
g(x) = f (x, y)dy = 3x(1−xy)dy = (3x−3x2 y)dy = [3xy− x2 y 2 ]|10 = 3x− x2 . (21)
−∞ 0 0 2 2
Definition: Let X and Y be random variables with bivariate probability function f (X, Y ), the
expectation of a function of X and Y , q(X, Y ), is:
XX
E[q(X, Y )] = q(Xi , Yj )f (Xi , Yj ), [X, Y, discrete] (22)
i j
or Z Z
E[q(X, Y )] = q(x, y)f (x, y)dydx, [X, Y, continuous]. (23)
x y
or
Z Z
Cov(X, Y )] = E[(X − µX )(Y − µY )] = (x − µX )(y − µY )f (x, y)dydx, [X, Y, continuous].
x y
(25)
41
STA211: Foundations of Probability and Statistics 41
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES
Note that through expanding the term in the expectation, Cov(X, Y ) is simplified as Cpov(X, Y ) =
E(XY ) − µX µY . The quantity Cov(X, Y ) measures joint variability of the random variables X
and Y , i.e.: (a) If the probability is high that large values of X − µX are associated with large
values of Y − µY , and small values of X − µX are associated with small values of Y − µY , then
X and Y are positively related and Cov(X, Y ) > 0,
(b) If the probability is high that large values of X − µX are associated with small values of
Y − µY and small values of X − µX are associated with large values of Y − µY , then X and Y
are negatively related and Cov(X, Y ) < 0,
(c) If the probability is high that values of X − µX will have no association with values of Y − µY ,
then Cov(X, Y ) = 0
Whereas the sign of Cov(X, Y ) indicates the direction of the relationship between the random
variables X and Y , its magnitude depends upon the units in which X and Y are measured. To
correct for the scaling of X and Y , Cov(X, Y ) is divided by σX σY , which gives the coefficient of
correlation between the two random variables.
Definition: The coefficient of correlation between the random variables X and Y , denoted by
Corr(X, Y ) or ρXY is given by:
Cov(X, Y )
Corr(X, Y ) = , −1 ≤ Corr(X, Y ) ≤ 1. (26)
σX σY
The quantity Corr(X, Y ) measures the strength as well as direction of the linear relationship
between X and Y . When Corr(X, Y ) = 1, it means X and Y have perfect positive linear
association. Whereas Corr(X, Y ) = −1 implies perfect negative association between the two
variables. When there is no linear association between the two variables, then Corr(X, Y ) = 0
(there reverse does not always hold).
Properties of Covariance
Given any two random variables X and Y on the same sample space S, and constants a, b:
4. E( X
Y
) ≈ ( µµXY ) − ( Cov(X,Y
µ2
)
) + ( µX Vµar(Y
3
)
),
Y Y
5. Cov( X
Y
) ≈ ( µµXY )2 − [( V ar(X)
µ2
) + ( V ar(Y
µ2
)
) − ( 2Cov(X,Y
µX µY
)
)].
X Y
Examples
From the data in Table 6, Cov(X, Y ) = E(XY ) − µX µY .
Now E(XY ) = 1 × 0 × 0.25 + 1 × 1 × 0.25 + 2 × 0 × 0.25 + 2 × 1 × 0.25 = 0 + 0.25 + 0 + 0.50 = 0.75.
While µX = 1×0.50+2×0.50 = 0.50+1.00 = 1.50 and µY = 0×0.50+1×0.50 = 0.00+0.50 = 0.50.
Hence, Cov(X, Y ) = 0.75 − 1.50 × 0.50 = 0.
As for correlation of X and Y , we need V ar(X) = 12 (0.50) + 22 (0.50) − 1.52 = 2.50 − 2.25 = 0.25,
42
STA211: Foundations of Probability and Statistics 42
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES
p √
so that σX = V ar(X)
p = 0.25√= 0.5, and V ar(Y ) = 02 (0.50) + 12 (0.50) − 0.52 = 0.50 − 0.25 =
0.25, so that σY = V ar(Y ) = 0.25 = 0.5.
Therefore, Corr(X, Y ) = Cov(X,Y
σX σY
) 0
= (0.5)(0.5) = 0.
This implies that the data for the two random variables do not show any linear association. In
other words, there is no linear association or pattern we can deduce from the observed values of
X and Y .
Definition: Let X and Y be random variables with joint probability distribution f (X, Y ) and
marginal probability functions g(X) and h(Y ) respectively. Then, the conditional probability
mass (or density) function of X given Y is:
g(X|Y ) = f h(Y
(X,Y )
)
, for h(Y ) > 0.
It follows from definition of conditional probability that the multiplication theorem for probability
function of random variables is
Definition: Let X and Y be discrete random variables with bivariate probability function
f (X, Y ) and marginal probability functions g(X) and h(Y ), respectively. Then, the random
variable X is independent of the random variable Y if:
and
f (X, Y ) = g(X)h(Y ). (29)
These equalities must hold true for all possible pairs of X and Y values. If inequality obtains for
at least one point (X, Y ), then the random variables X and Y are said to be dependent.
The definitions of conditional expectation and conditional variance of a random variable given
the other variable follow from the original definitions of expectation and variance, as follows:
X X
E(X|Y ) = xf (X|Y ) and E(X 2 |Y ) x2 f (X|Y ), (30)
x x
and
V ar(X|Y ) = E[(X − µX|Y )2 |Y ] = E(X 2 |Y ) − (E(X|Y ))2 . (31)
Examples
f (Y =1,X=2) 0.25
From the data in Tables 6 and 7, P (Y = 1|X = 2) = g(X=2)
= 0.50
= 0.50.
To prove independence of X and Y , we must do so with all pairs and their probabilities in Tables
6 to 8, i.e. P (X = 1)P (Y = 0) = 0.50 × 0.50 = 0.25 = P (X = 1, Y = 0). This is also the case
with all other probability values of X and Y in Table 6 and their marginals in Tables 7 and 8.
Hence, we can conclude that X is independent of Y .
Session 16 Exercises
43
STA211: Foundations of Probability and Statistics 43
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES
1. Let a random experiment consist of tossing a fair coin twice. Define the random variable
X to be the number of heads obtained in the two tosses and let the random variable Y
be the opposite face outcome. Determine the points within the sample space S and the
values of X and Y on S. Next, specify the bivariate probability distribution between X
and Y . Find: (a)P (X = 2|Y = 1), (b) the marginal distribution of Y , (c) the conditional
distribution of X given Y = 0. Are X and Y independent?
44
STA211: Foundations of Probability and Statistics 44
5 UNIT 5: STANDARD PROBABILITY LAWS OF DISCRETE TYPE
This unit discusses some common probability laws for discrete random variables.
Introduction
In this session, two basic standard probability distributions of discrete uniform and Bernoulli are
discussed.
Objectives of Session 16
After studying this session, you should be able to:
Calculate probabilities and summary measures given discrete uniform or Bernoulli proba-
bility distribution problem
Suppose a discrete random variable X defined on a sample space S has its range as n finite set of
numbers, such as X = {1, 2, 3, ..., n}. Suppose each value of X is equally likely to occur from S,
thus each value has same probability n1 of occurence. Then, X is said to have a discrete uniform
distribution, whose probability mass function, denoted by f (X; n) or P (X = x; n) is given by:
f (X; n) = n1 for x = 1, 2, ..., n and 0 elsewhere.
The greatest benefit of knowing a particular probability law given the data is in getting a right
mathematical expression that characterises the distribution, with which we can compute prob-
ability problems associated with the distribution. For example, if X ∼ U nif (1, 2, ..., n), where
the symbol ∼ is read as ”is distributed as”, then for any given integer 1 ≤ b ≤ n:
b
F (b) = P (X ≤ b) = , (32)
n
while
n
X
E(X) = xp(x)
x=1
n
X 1
= x.
n
x=1
n
1X (33)
= x
n
x=1
1 n(n + 1)
=
n 2
n+1
= ,
2
45
STA211: Foundations of Probability and Statistics 45
5.1 Session 17: Discrete
5 UNITUniform
5: STANDARD
and Bernoulli
PROBABILITY
distributions LAWS OF DISCRETE TYPE
and
Example
Given that X has discrete uniform distribution with 6 values. Find E(X) and V ar(X).
n+1 6+1 n2 −1 62 −1 35
Solution: E(X) = 2
= 2
= 3.5. Whereas V ar(X) = 12
= 12
= 12
.
Consider an experiment with only two possible outcomes, dubbed success or failure. Let p be
the probability of obtaining a ’success’ in such experiment and 1 − p the probability of obtaining
a ’failure’. Define a random variable X as total number of ’successes’ that can be obtained when
such experiment is performed once. Then, X can be 0 or 1. It is 0 if the experiment ends in
giving an outcome of ’failure’, and it is 1 if the experiment gives an outcome of ’success’. Further,
X is 1 with probability p and 0 with probability 1 − p.
Such an experiment (or trial) whose outcome can only be a ’success’ or a ’failure’ is called
a Bernoulli trial, named in honour of the Swiss mathematician Jakob Bernoulli (1654-1705).
While, the random variable X that is generated by counting the number of ’successes’ in one
Bernoulli trial, which can be 1 or 0, is called a Bernoulli random variable. If X ∼ Bernoulli(p),
then its probability mass function is given by:
46
STA211: Foundations of Probability and Statistics 46
5.2 Session 18: Binomial
5 UNIT and5:Negative
STANDARDBinomial
PROBABILITY
Distributions LAWS OF DISCRETE TYPE
and
V ar(X) = E(X 2 ) − (E(X))2
1
X
= x2 p(x) − p2
x=0 (37)
= 0 .(1 − p) + 12 .(p) − p2
2
= p − p2
= p(1 − p).
47
STA211: Foundations of Probability and Statistics 47
6 UNIT 6: STANDARD PROBABILITY LAWS OF CONTINUOUS TYPE
48
STA211: Foundations of Probability and Statistics 48
7 UNIT 7: RELATING PROBABILITY LAWS
49
STA211: Foundations of Probability and Statistics 49