Foundation Probability (Lecture Notes)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 51

School of Natural and Applied Sciences

Department of Mathematical Sciences

STA211: Foundations of Probability and Statistics

Learners’ Lecture Notes

Tsirizani M. Kaombe and Elias P. Mwakilama

Revised Version, 2022

i
STA211: Foundations of Probability and Statistics i
Preface
This module presents simplified notes for an introductory course in probability and statistics
for university learners. It targets undergraduate degree students in science and mathematics,
quantitative social sciences, engineering, finance, and related fields. The presentation of material
is done in such a way that it maximises self-learning by students. A lot of examples have been
given to support the presented concepts. However, to fully understand the material, students
are expected to have a good background in elementary mathematics courses such as calculus.
All calculations in this module can be done by hand, with the aid of just a scientific calculator.
However, computer apps may be used where necessary for purposes of speed of calculations.

The content is organised in units. There are seven units, each of which is divided into sessions,
that are further organised into topics of study. A session is essentially a lesson or lecture. For
a university student with the required aptitude, a session is designed to last for 60 minutes. At
the end of each session, there are exercises for students’ practice. Each unit has also references
to motivate further learning by students. The units, sessions, as well as topics in this module are
presented in a sequential manner, such that to understand content of each session, a learner is
advised to first study the preceding session(s). Instructors who may use these lecture notes are
therefore advised not to skip sessions or topics to avoid inducing knowledge gap in learners.

The module first revisits the material from set theory as it is essential for learning of the rest
content. Later, the concept of probability and random variable is presented. Learners will benefit
from the various proofs that are provided for the given theorems in probability, which will help
in acquiring knowledge for proving theorems in statistics. These follow from basic techniques
in logic such as proof by mathematical induction and proof by counterexample. A number of
commonly used standard random variables have been presented together with their properties
to prepare learners for future courses in statistics, such as regression modelling.

We convey our special thanks to the Department of Mathematical Sciences at University of


Malawi for sponsoring the development of this module.

i
STA211: Foundations of Probability and Statistics i
1 UNIT 1: REVIEW OF SET THEORY

1 Unit 1: Review of Set Theory

This unit revisits concepts of set theory, that are useful in studying probability theory.

1.1 Session 1: Sets and Elements

Introduction
In this session the concept of a set is reviewed. Students will also be reminded of the ideas of
members of a set called elements. Various mathematical symbols studied in MAT100 are used.
Where a proof of a theorem is given, the symbol  shall mean end of proof.

Objectives of Session 1
After studying this session, you should be able to:

ˆ Define a set and its element

ˆ Give examples of sets

ˆ Define subset, equal sets, and null set

ˆ Identify proper subsets

1.1.1 Sets, Elements

A set is a well-defined list or collection of objects. An object that belongs to a particular set is
called an element or a member of that set.

Set notation
In most cases, uppercase letters of alphabet (i.e. A,B,M,X,...) are used to denote sets, while
lower cases letters (i.e. a,b,x,...) for members of sets. Let A denotes a set and k its element,
then the expression k ∈ A is read as ”k is an element of A”. If some item p is not a member of
A, it is written as p ∈
/ A.

Set examples
To specify that certain objects belong to a given set, the braces, {} (called set builders) are used,
by either providing a roster, i.e. complete list of all elements in the braces or the rule method,
i.e. stating properties that characterize the elements, within the braces. For example,

(a) A = {1, 2, 3, 5, 7, 11} means A is a set consisting of numbers 1, 2, 3, 5, 7, and 11,

(b) B = {x : x is a prime number, x < 13} means B is a set of prime numbers less than 13,

(c) 17 ∈
/ A means that 17 is not a member of set A.

1
STA211: Foundations of Probability and Statistics 1
1.1 Session 1: Sets and Elements 1 UNIT 1: REVIEW OF SET THEORY

1.1.2 Subset

Let A and B be any two sets. If every element of A also belongs to set B, i.e. if p ∈ A implies
p ∈ B, then A is called a subset of B or is said to be contained in B.

Two different types of subsets emerge from this definition of subset; proper and improper subsets.
A is a proper subset of B if it is a subset of B and there exists at least one element of B that
does not belong to A; otherwise A is an improper subset of B.

Subset notation
To show that A is a subset of B, we write it as A ⊂ B.

Subset examples
Let N = {1, 2, 3, ...} represents a set of positive integers, Z = {..., −2, −1, 0, 1, 2, ...} represents
a set of integers, and R = {x : x is a real number}. Then N ⊂ Z ⊂ R.

A subset of real numbers can also be defined as an interval on the real line. For example,

(a) (a, b) = {x : a < x < b, x ∈ R} means an open interval of real numbers from a to b,

(b) [a, b] = {x : a ≤ x ≤ b, x ∈ R} means an closed interval of real numbers from a to b,

(c) (a, b] = {x : a < x ≤ b, x ∈ R} means an open-closed interval of real numbers from a to b,

(d) [a, b) = {x : a ≤ x < b, x ∈ R} means an closed-open interval of real numbers from a to b.

1.1.3 Equal sets

Two sets are equal if each is contained in the other, i.e. let A and B be any two sets, then if
x ∈ A =⇒ x ∈ B and x ∈ B =⇒ x ∈ A, we have A = B. In a nutshell, A = B if and only if
A ⊂ B and B ⊂ A.

The definition A ⊂ B includes the case A = B. If A ⊂ B but A 6= B, then we say that A is a


proper subset of B.

Equal sets examples


Let A = {1, w, 10} and B = {w, 10, 1}. Since all elements of A are also in B and vice versa, then
A = B.

1.1.4 Universal set, Empty set

Whwn in a particular discussion we have all the sets under analysis, i.e. A, B, C, etc being subsets
of one set that we denote by U , then U is called the universal set.

A null or empty set, denoted by ∅ is the set with no elements, i.e. ∅ = {}.

2
STA211: Foundations of Probability and Statistics 2
1.2 Session 2: Set operations 1 UNIT 1: REVIEW OF SET THEORY

An empty set is a subset of every other set A, i.e. ∅ ⊂ A. This is the case because ”every x ∈ ∅
(there are none) also belongs to A” is a true statement for any set A since there is no x ∈ ∅ to
make the statement false.

Empty set example


Let C = {x : x2 = 4, x is odd}. Then C = ∅.

Theorem
Let A, B, and C be any sets. Then,

(a) A ⊂ A
(b) if A ⊂ B and B ⊂ A, then A = B, and
(c) if A ⊂ B and B ⊂ C, then A ⊂ C.

Proof

(a) Let x ∈ 1st A, then by uniqueness of sets x ∈ 2nd A. Hence by definition of subset,
A⊂A
(b) Let x ∈ A. Then A ⊂ B =⇒ x ∈ B. Likewise, B ⊂ A =⇒ if y ∈ B then y ∈ A. Now,
we have x ∈ A =⇒ x ∈ B and y ∈ B =⇒ y ∈ A. Hence x = y, ∴ A = B 
(c) Let x ∈ A, then A ⊂ B =⇒ x ∈ B. While B ⊂ C =⇒ x ∈ C. ∴ A ⊂ B and B ⊂ C
means x ∈ A =⇒ x ∈ B and x ∈ C. Hence, A ⊂ B ⊂ C, or A ⊂ C. 

Session 1 Exercises

(a) Let A = {x : 3x = 6}. Does A = 2?


(b) Which of these sets are equal: {r, s, t}, {t, s, r}, {s, r, t}, {t, r, s}?
(c) Determine whether or not each set is the null set: (i) X = {x : 2x = 4, x2 = 9}, (ii)
Y = {x : x + 8 = 8}
(d) Prove that A = {2, 3, 4, 5} is not a subset of B = {x : x is even}.

1.2 Session 2: Set operations

Introduction
This session presents the opertaions of union, intersection, complement, difference, and Catersian
product of two sets. Students will be reminded of various operations of addition and multiplica-
tion with numbers, in which applying the operation on any pair of numbers resulted in another
number.

Objectives of Session 2
After studying this session, you should be able to:

3
STA211: Foundations of Probability and Statistics 3
1.2 Session 2: Set operations 1 UNIT 1: REVIEW OF SET THEORY

ˆ Find union of two or more sets


ˆ Find intersection of two or more sets
ˆ Find complement and difference of two sets
ˆ Analyse product set

1.2.1 Union

Let A and B be any sets. The union of A and B, denoted by A ∪ B, is the set that consists of
all the elements that belong to A or to B or to both, i.e. A ∪ B = {x : x ∈ A or x ∈ B}.

Example of Union
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Then A ∪ B = {1, 2, 3, 4, 5, 6}.

1.2.2 Intersection

Let A and B be any sets. The intersection of A and B, denoted by A ∩ B, is the set of elements
which belong to both A and B, i.e. A ∩ B = {x : x ∈ A and x ∈ B}.

If A ∩ B = ∅, that is, if A and B do not have any elements in common, then A and B are said
to be disjoint.

Example of Intersection
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Then A ∩ B = {3, 4}.

1.2.3 Complement

Let A be any set and U the universal set. The complement or absolute complement of A, denoted
by A{ , is the set of elements which do not belong A, i.e. A{ = {x : x ∈ U, x ∈
/ A}. In essence,
A{ is the difference of the universal set U and A.

Example of Complement
Let A = {1, 2, 3, 4} and U = {1, 2, 3, ...}. Then A{ = {5, 6, 7, ...}.

1.2.4 Difference

Let A and B be any sets. The difference of A and B or the relative complement of B with
respect to A, denoted by A\B, is the set of elements which belong to A but not to B, i.e.
A\B = {x : x ∈ A , x ∈/ B}.

Note that A\B and B are disjoint, i.e. (A\B) ∩ B = ∅.

Example of Difference
Let A = {1, 2, 3, 4} and B = {3, 4, 5, 6}. Then A\B = {1, 2}.

4
STA211: Foundations of Probability and Statistics 4
1.2 Session 2: Set operations 1 UNIT 1: REVIEW OF SET THEORY

1.2.5 Symmetric Difference

The symmetric difference of the sets A and B, denoted by A ⊕ B is the set consisting of those
elements which belong to A or B, but not both, i.e. A ⊕ B = (A ∪ B)\(A ∩ B) or A ⊕ B =
(A\B) ∪ (B\A).

Example of Symmetric Difference


Using the previous example, A ⊕ B = (A ∪ B)\(A ∩ B) = {1, 2, 3, 4, 5, 6}\{3, 4} = {1, 2, 5, 6}.

The Venn (or Euler) diagrams are frequently useful in picturing sets and relationships between
sets. These diagrams use geometric shapes to represent sets (the actual shape used has no real
bearing). Below is an example of a Venn diagram, with shaded region representing A ∩ C.

B C

1.2.6 Product set or Cartesian product

Definintion An n-tuple is an ordered array of n components written (x1 , x2 , ..., xn ). (1, 2), (0, 100), (a, b)
are examples of 2-tuples, while (1, 1, 1), (a, c, b), (2, 1, 2) are cases of 3-tuples.

Definition Let A and B be two sets. The product set or Catersin product of A and B, denoted
by A × B, is a set of all possible 2-tuples (a, b), where a ∈ A and b ∈ B, i.e. A × B = {(a, b) :
a ∈ A, b ∈ B}.

The product of a set with itself, say A × A, is denoted by A2 . The concept of product set is
extended to any finite number of sets in a natural way. The product set of the sets A1 , A2 , ..., Am ,
written A1 × A2 × ... × Am , is the set of all ordered m-tuples (a1 , a2 , ..., am ) where ai ∈ Ai for
each i.

Examples of product set

(a) Let A = {1, 2, 3} and B = {a, b}. Then A × B = {(1, a), (1, b), (2, a), (2, b), (3, a), (3, b)}
(b) The cartesian plane R2 = RXR has each point P that represents an ordered pair (a, b)

5
STA211: Foundations of Probability and Statistics 5
1.3 Session 3: Properties of set operations 1 UNIT 1: REVIEW OF SET THEORY

of real numbers, with a representing a value in the horizontal X-axis, and b representing
values in vertical Y -axis. Hence, every region of the Catersian plane is a product set.

Session 2 Exercises

1. Let U = {1, 2, 3, ..., 9}, A = {1, 2, 3, 4}, B = {2, 4, 6, 8} and C = {3, 4, 5, 6}. Find (i) A{ ,
(ii) A ∩ C, (iii) (A ∩ B){ , (iv) A ∪ B, (v) B\A
2. Let U = {a, b, c, d, e}, A = {a, b, d}, B = {b, d, e}. Find (i) A ∪ B, (ii) B { , (iii) A{ ∩ B,
(iv) A{ ∩ B { , (v) (A ∩ B){ , (vi) B ∩ A, (vii) B\A, (viii) A ∪ B { , (ix) B { \A{ , (x) (A ∪ B){
3. Prove that B\A = B ∩ A{
4. Prove that A ⊂ B if and only if A ∩ B = A
5. Let A = {1, 2}, A = {2, 1} and C = {10, 12}. Find A × B, B × A, A × A, A × B × C.

1.3 Session 3: Properties of set operations

Introduction
This session introduces learners to various mathematical laws governing the set operations. The
laws form building blocks for most mathematical operations in the topics that follow.

Objectives of Session 3
After studying this session, you should be able to:

ˆ Apply idempotent, associative, commutative, distributive, identity, compelent, and De


Morgan’s laws on sets
ˆ Prove some of the above properties where applicable

1.3.1 Idempotent Laws

1. A ∪ A = A
2. A ∩ A = A

1.3.2 Associative Laws

1. (A ∪ B) ∪ C = A ∪ (B ∪ C)
2. (A ∩ B) ∩ C = A ∩ (B ∩ C)

1.3.3 Commutative Laws

1. A ∪ B = B ∪ A
2. A ∩ B = B ∩ A

6
STA211: Foundations of Probability and Statistics 6
1.3 Session 3: Properties of set operations 1 UNIT 1: REVIEW OF SET THEORY

1.3.4 Distributive Laws

1. A ∪ (B ∩ C) = (A ∪ B) ∩ (A ∪ C)

2. A ∩ (B ∪ C) = (A ∩ B) ∪ (A ∩ C)

1.3.5 Identity Laws

1. A ∪ ∅ = A

2. A ∩ U = A

3. A ∪ U = U

4. A ∩ ∅ = ∅

1.3.6 Complement Laws

1. A ∪ A{ = U

2. A ∩ A{ = ∅

3. (A{ ){ = A

4. U { = ∅

5. ∅{ = U

1.3.7 De Morgan’s Laws

1. (A ∪ B){ = A{ ∩ B {

2. (A ∩ B){ = A{ ∪ B {

The above laws follow from laws of logic. For example, the proof of DeMorgan’s law 1), which
can go as follows: (A ∪ B){ = {x : x ∈ / (A or B)} = {x : x ∈/ A and x ∈/ B} = A{ ∩ B { is
equivalent to logical law ¬(p ∨ q) = ¬p ∧ ¬q.

Session 3 Exercises

1. Prove that each of following conditions is equivalent to A ⊂ B: (i) A ∩ B = A, (ii)


A ∪ B = B, (iii) B { ⊂ A{ , (iv) A ∩ B { = ∅, (v) B ∪ A{ = U

7
STA211: Foundations of Probability and Statistics 7
1.4 Session 4: Finite Sets, Classes of Sets 1 UNIT 1: REVIEW OF SET THEORY

1.4 Session 4: Finite Sets, Classes of Sets

Introduction
In this session learners will be introduced to some descriptions of set based on its size or number
of elements.

Objectives of Session 3
After studying session unit, you should be able to:

ˆ Define finite set and countable set

ˆ Find product set

ˆ Find a partition of a set

1.4.1 Finite and Countable Sets

Finite set
A set A is finite if A is empty or if A consists of exactly m elements, where m is a positive
integer; otherwise A is infinite.

Examples of finite set

1. Let M be the set of the days of the week; that is,


M = {M onday, T uesday, W ednesday, T hursday, F riday, Saturday, Sunday}. Then
M is finite.

2. Let P = {x : x is a river on earth}. Although it may be difficult to count the number of


rivers on earth, P is a finite set.

3. Let Y be the set of (positive) even integers, i.e. Y = {2, 4, 6, ...}. Then Y is an infinite set.

4. Let I be the unit interval of real numbers, i.e. I = {x : 0 ≤ x ≤ 1}. Then I is also an
infinite set.

Countable set
A set A is countable if A is finite or if its elements can be arranged in the form of a sequence, in
which case A is said to be countably infinite; otherwise A is uncountable.

The set in example 3 above is countably infinite, whereas it can be shown that the set in example
4 above is uncountable.

Counting elements in finite sets


Let A be any set, then the notation n(A) will denote the number of elements in A. For example,
if A = {1, a, w, 4, 8}, then n(A) = 5. For an empty set ∅, n(∅) = 0.

Lemma

8
STA211: Foundations of Probability and Statistics 8
1.4 Session 4: Finite Sets, Classes of Sets 1 UNIT 1: REVIEW OF SET THEORY

(a) Suppose A and B are disjoint sets. Then A∪B is finite, and hence n(A∪B) = n(A)+n(B)

(b) n(A\B) = n(A) − n(A ∩ B)

(c) If U is the universal set, then n(A{ ) = n(U ) − n(A)

(d) Suppose A and B are finite. Then A × B is finite and n(A × B) = n(A) × n(B).

Theorem (Inclusion-Exclusion Principle)


Suppose A and B are finite sets. Then A ∩ B and A ∪ B are finite, and n(A ∪ B) = n(A) +
n(B) − n(A ∩ B).

The above theorem can be generalised to any finite number of finite sets. For example, if A, B,
C are finite sets, then A ∪ B ∪ C is finite and
n(A ∪ B ∪ C) = n(A) + n(B) + n(C) − n(A ∩ B) − n(A ∩ C) − n(B ∩ C) + n(A ∩ B ∩ C).

Example of countable sets


Suppose in a Form 3 class, 30 students take mathematics and 35 students take English, while 20
students take both subjects. Find the number of students that study: a)only mathematics, b)
mathematics or English, c) exactly one of the two subjects.

Solution: a) Let M be a list of students that study mathematics and E be the list for English,
then n(M \E) = n(M ) − n(M ∩ E) = 30 − 20 = 10. b) n(M ∪ E) = n(M ) + n(E) − n(M ∩ E) =
30 + 35 − 20 = 45. c) We will use symmetric difference set to answer part c), i.e. n(M ⊕ E) =
n((M \E) ∪ (M \E)) = 10 + 15 = 25.

1.4.2 Classes of Sets

Members of a set can be sets themselves. To help clarify these situations, we usually use the
word class or family for such a set. The words subclass and subfamily have meanings analogous
to subset.

Examples of class of subsets Let S = {{1, 2, 3, 4}. Let W be the class of subsets of S which
contains exactly three elements of S. Then W = [{1, 2, 3}, {1, 2, 4}, {1, 3, 4}, {2, 3, 4}]

1.4.3 Power set

Consider any set A, the power set of A, denoted by P(A), is the class of all subsets of A. In
general, if A is finite, so is P(A). The number of elements in P(A) is 2 raised to the power n(A),
i.e. n(P(A)) = 2n(A) .

Example of power set


Let A = {a, b, c}, then P(A) = {A, {a, b}, {a, c}, {b, c}, {a}, {b}, {c}, ∅}.

9
STA211: Foundations of Probability and Statistics 9
1.5 Unit 1 Summary 1 UNIT 1: REVIEW OF SET THEORY

1.4.4 Partitions

Let X be nonempty set. A partition of X is a subdivision of X into nonoverlapping, nonempty


subsets. The subsets in a partition are called cells.

Examples of partition
Consider the following classes of subsets of X = {1, 2, 3, ..., 9}: (i) [{1, 3, 5}, {2, 6}, {4, 8, 9}], (ii)
[{1, 3, 5}, {2, 4, 6, 8}, {5, 7, 9}], (iii) [{1, 3, 5}, {2, 4, 6, 8}, {7, 9}].

Then (i) is not a partition of X since 7 ∈ X but 7 does not belong to any of the cells. Furthermore,
(ii) is not a partition of X since 5 ∈ X and 5 belongs to both {1, 3, 5} and {5, 7, 9}. On the other
hand, (iii) is a partition of X since each element of X belongs to exactly one cell.

Session 4 Exercises

1. Consider the set A = [{1, 2, 3}, {4, 5}, {6, 7, 8}]. Find (i) the elements of A; (ii) n(A).

2. List the elements of a the power set P(A) of A = {a, b, c, d}

1.5 Unit 1 Summary

In this unit you have studied

1. Solving problems related to sets, based on definitions, some operations and properties of
sets

2. Proving some theorems using definitions related to sets and set operations

1.6 Unit 1 References


1. Lipschutz, S. and Lipson, M. Probability, 2nd ed. Schaum’s Outlines.

2. Larson, H.J. (1982). Introduction to probability theory and statistical inference, 3rd ed.
New York: John Wiley and Sons.

3.

4. Panik, M.J. (2005). Advanced statistics from an elementary point of view. Amsterdam:
Elsevier.

10
STA211: Foundations of Probability and Statistics 10
2 UNIT 2: PROBABILITY

2 Unit 2: Probability

This unit introduces learners to the concept of probability, a quantitative measure for uncertain
events. Learners will solve various probability problems using definitions and theorems related
to probability theory.

2.1 Session 5: Sample space, Event, Probability

Introduction
In this session, an axiomatic definition of probability, that is based on set theory, is discussed.

Objectives of Session 5
After studying this session, you should be able to:

ˆ Calculate probability of an event

ˆ Prove some probability relations using axioms of probability

2.1.1 Sample space, Event

Definition An random experiemnt is any sort of operation whose outcome cannot be predicted
in advance with certainty. Examples include: flipping a coin and observing which face lands on
top; and planting a particular hybrid corn on a given plot on ground and observing its yield.

Definition An outcome of an experiment is some observation or measurement. A single outcome


of an experiment is called a basic or simple outcome. In the above example of coin flipping, any
of the two faces of a coin is a basic outcome.

Sample space
The sample space for an experiment is the set of all possible outcomes that might be observed. In
other words, a sample space, denoted by S is the universal set pertinent to a given experiment.

Example of sample space


A sample space for an experiment of filling a coin could be S = {Head, T ail}. While a sample
space for an experiemnt of rolling a die once to observe the number (of dots) that appears on
top would be S = {1, 2, 3, 4, 5, 6}

Event
An event is a subset of a sample space. It is a set of basic outcomes. The event consisting of a
single point or outcome a ∈ S is called an elementary event.

The empty set ∅ and S are subsets of S, and hence they are events. The event ∅ is called the
impossible or null event, while S is called the certain or sure event.

We can build new events using set operations:

11
STA211: Foundations of Probability and Statistics 11
2.1 Session 5: Sample space, Event, Probability 2 UNIT 2: PROBABILITY

(a) A ∪ B is the event that occurs iff A occurs or B occurs (or both)
(b) A ∩ B is the event that occurs iff A occurs and B occurs
(c) A{ is the event that occurs iff A does not occur.

Events A and B are called mutually exclusive if they are disjoint, i.e. if A ∩ B = ∅. Three or
more events are mutually exclusive if every two of them are mutually exclusive.

A sample space S is continuous if it is in an interval or a product of intervals. In such a case,


only speacial subsets (called measurable sets) will be events. On the other hand, if the sample
space S is discrete, i.e. if S is finite or countably infinite, then evenry subset of S is an event.

2.1.2 Probability Axioms

Let S be a sample space, let E be the class of all events, and let P be a real-valued function
defined on E. Then P is called a probability function, and P (A) is called the probability of event
A, when the following axioms (also called Kolmogorov axioms for probality function) hold:

1. For any event A, we have P (A) ≥ 0.


2. P (S) = 1.
3. For any two pairwise disjoint events A and B, we have P (A ∪ B) = P (A) + P (B).
4. For any infinite sequence of mutually disjoint events A1 , A2 , A3 , ..., we have P (A1 ∪ A2 ∪
A3 ∪) = P (A1 ) + P (A2 ) + P (A3 ) + ...

When P does satisfy the above axioms, the sample space S will be called a probability space.

Theorem
The impossible event has probability zero, i.e. P (∅) = 0.

Proof : For any event A on sample space S, we have A ∪ ∅ = A, where A and ∅ are disjoint.
Then by axiom 3, P (A) = P (A ∪ ∅) = P (A) + P (∅). It follows that P (∅) = P (A) − P (A) = 0..

Theorem (Complement Rule)


For any event A, we have P (A{ ) = 1 − P (A)

Proof : S = A ∪ A{ , where A and A{ are disjoint. By axiom 2, P (S) = 1. Using axiom 3,


P (S) = P (A ∪ A{ ) = P (A) + P (A{ ) = 1. Hence, P (A{ ) = 1 − P (A)..

Theorem
If A ⊂ B, then P (A) ≤ P (B)

Proof : If A ⊂ B, then B can be decomposed into mutually exclusive events A and B\A. Then,
A ∪ B\A = B. Thus, P (A ∪ B\A) = P (B) =⇒ P (A) + P (B\A) = P (B) (axiom 3). Since
P (B\A) ≥ 0, then P (A) ≤ P (B). .

12
STA211: Foundations of Probability and Statistics 12
2.2 Session 6: Finite Probability Spaces 2 UNIT 2: PROBABILITY

Theorem
If A and B are any events, then P (A\B) = P (A) − P (A ∩ B).

Proof : Decompose A into mutually exclusive events A\B and A∩B, so that A = (A\B)∪(A∩B).
Then by axiom 3, P (A) = P ((A\B) ∪ (A ∩ B)) = P (A\B) + P (A ∩ B) =⇒ P (A\B) =
P (A) − P (A ∩ B). .

Theorem (Addition Rule)


For any two events A and B, P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

Proof : The event A ∪ B can be decomposed into disjoint events of A\B and B, so that A ∪ B =
(A\B)∪B. It follows by axiom 3 that P (A∪B) = P ((A\B)∪B) = P (A\B)+P (B). By previous
theorem, P (A ∪ B) = P (A) − P (A ∩ B) + P (B), ∴ P (A ∪ B) = P (A) + P (B) − P (A ∩ B). .

Just like the inclusion-exclusion principle for sets apply to any finite number of sets, the addition
rule for probability can be extended by induction to any finite number of events. For instance
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B) − P (A ∩ C) + P (B ∩ C) + P (A ∩ B ∩ C).

Session 5 Exercises

1. Given S = {1, 2, 3}, A = {1}, B = {3}, C = {2}, P (A) = 13 , P (B) = 13 . Find (a)P (C),
(b)P (B ∪ C), (c)P (A{ ∩ B { )
2. Suppose A and B are any events on a sample space and P (A{ ∩ B) = 0.1, P (A ∩ B { ) = 0.4,
P (A ∩ B){ = 0.6. Compute (a)P (A), (b)P (B), (c)P (A ∪ B), (d)P (A{ ∪ B).
3. Let a coin and a die be tossed. Specify the sample space. Define the event A with outcomes
of heads and an even number, and event B with outcomes of a number less than 3.

2.2 Session 6: Finite Probability Spaces

Introduction
In this session, further linkage of probability to events on sample space is presented.

Objectives of Session 6
After studying this session, you should be able to:

ˆ Specify events from a finite sample space


ˆ Solve probability problems on a finite sample space
ˆ Solve probability problems from equiprobable spaces

2.2.1 Finite probability space

Let S be a finite sample space, say S = {a1 , a2 , ..., an }, a finite probability space is obtained
by assigning to each point ai ∈ S a realP number pi , called the probability of ai , satisfying the
following properties: (a) pi ≥ 0; and (b) ni=1 pi = 1.

13
STA211: Foundations of Probability and Statistics 13
2.2 Session 6: Finite Probability Spaces 2 UNIT 2: PROBABILITY

The probability
P P (A) of an event A is defined as the sum of the probabilities of the points in A,
i.e. P (A) = ai ∈A P (ai ).

Sometimes the points in a finite sample space S and their assigned probabilities are given in the
form of a table. Such a table is called a probability distribution.

Example
A die has been loaded in a manner such that the probability of face i being uppermost, when it
stops rolling, is proportional to i, i = 1, 2, ..., 6. Specify the probability space.

Solution: The sample space is S = {1, 2, 3, 4, 5, 6}, Let’s define the six distinct single-element
events by Ai = {i}, i = 1, 2, 3, 4, 5, 6. Let p = P (A1 ), then we have P (A2 ) = 2p, P (A3 ) = 3p,...,
1 1
P (A6 ) = 6p. It follows that p + 2p + 3p + 4p + 5p + 6p = 1, hence p = 21 . ∴ P (1) = 21 , P (2) =
2 3 4 5 6
21
, P (3) = 21 , P (4) = 21 , P (5) = 21 , P (6) = 21 and the probability function is completely
specified.

2.2.2 Finite Equiprobable Spaces

Suppose S is a finite probability space with k elements. If the k single-element events are equally
likely, then their common probability must equal k1 . Then, the probability space is called a finite
equiprobable space.

Further, if A ⊂ S is any event containing r single-element events whose union is A, then the r
points have probability kr . In other words P (A) = n(A)
n(S)
.

Theorem
n(A)
Let S be a finite sample space and, for any A ⊂ S, let P (A) = n(S)
. Then P satisfies the three
axioms for a probability function.

Proof
n(S) k
If S has k elements, then n(S) = k. P (S) = n(S)
= k
= 1, (axiom 2 satisfied). If A ⊂ S,
n(A) n(A)
n(A) ≥ 0 for all A ⊂ S. Then k
= n(S)
= P (A) ≥ 0 for all A ⊂ S (axiom 1 satisfied). If
n(A∪B) n(A) n(B)
A, B ⊂ S and A ∩ B = ∅, then n(A ∪ B) = n(A) + n(B). Hence n(S)
= n(S)
+ n(S)
. Hence,
P (A ∪ B) = P (A) + P (B) (axiom 3 satisfied) .

Examples of equiprobable spaces


(a) A card is selected at random from an ordinary deck of 52 playing cards. Consider the
following events (where a face card is a jack (J), queen (Q), or king (K)): A = {heart} and
B = {f ace card}. Find P (A), P (B), P (A ∩ B) and P (A ∪ B).
Solution: P (A) = no. of hearts
no. of cards
= 13
52
= 14 ; P (B) = no.no.
of f ace cards
of cards
12
= 52 3
= 13 ; P (A ∩ B) =
no. of heart f ace cards 3
no. of cards
= 52 . Use addition rule to answer the last part.

(b) Three horses A, B, C are in a race; A is twice as likely to win as B, and B is twice likely
to win as C. Find: i) their respective probabilities of winning, i.e. P (A), P (B), P (C); ii)
probability of B or C winning.
Solution: i) Let P (C) = p. Then P (B) = 2p and P (A) = 2P (B) = 2(2p) = 4p. The sum of
the probabilities must be 1. Hence, p + 2p + 4p = 1 =⇒ p = 71 . Therefore P (A) = 4p = 74 ,

14
STA211: Foundations of Probability and Statistics 14
2.3 Session 7: Infinite sample spaces 2 UNIT 2: PROBABILITY

P (B) = 2p = 27 , P (C) = p = 71 . ii) P (B or C) = P (B) + P (C) = 2


7
+ 1
7
= 37 .

Session 6 Exercises

1. Suppose a die is tossed once. Find the probability that: (i) an even number appears on
top; (ii) a number greater than 4 appears on top

2. A bag contains 4 white marbles, 3 red marbles, and 5 blue marbles. A marble is drawn at
random from the bag, what is the probability that it is red.

3. A trick coin is to be flipped one time. The probability of getting a head is three times
as large as the probability of getting a tail. What are the probabilities for the two single-
element events?

2.3 Session 7: Infinite sample spaces

Introduction
In this session, probability problems based on infinite sample space are discussed.

Objectives of Session 7
After studying this session, you should be able to:

ˆ Specify events from an infinite sample space

ˆ Solve probability problems based on uncountable infinite sample space

2.3.1 Countably infinite sample space

For infinite sample space S, two cases arise: either S is countably infinite or S is uncountable.

A countably infinite probability space S is said to be discrete. While when S is uncountable,


it is said to be nondiscrete or continuous. S is uncountable when it consists of a continuum of
points, such as an interval or product of intervals.

The examples of probability problems we have had so far are all based on a discrete sample
space. This section will look at cases of continuous sample space.

Definition: A real-element sample space S is called discrete if no finite-length interval on the


real line contains an infinite number of elements of S. E.g. S = {1, 2, 3, ..., n} - the first n
natural numbers; S = {0, 1, 2, 3, ...} - set of nonnegative intergers; S = {0, 2, 4, 6, ...} - set of even
integers; S = {..., −1, 0, 1, ...} - set of intergers, are all examples of discrete sample space.

The key thing here is that a discrete sample space S may contain a finite or an infinite number
of elements. The elements in this case are isolated points on the real line, there are other points
on the real line, between any two elements of S, which do not belong to S. With discrete sample
spaces, it is meaningful to specify the probabilities for single-element events.

15
STA211: Foundations of Probability and Statistics 15
2.3 Session 7: Infinite sample spaces 2 UNIT 2: PROBABILITY

Definition: A sample sapce S that has as elements all the points in an interval or union of
intervals on the real line is called continuous. E.g. S = {x : 0 ≤ x ≤ 10}; S = {x : 0 ≤ x ≤ 1}
or S = {x : 2 ≤ x ≤ 3}.

A continuous sample space always has an infinite number of elements. If x ∈ S and y ∈ S, then
all the points between x and y also belong to S, if x and y are selected from one of the intervals
belonging to S.

2.3.2 Probability of events on a continuous sample space

Suppose S = {x : a ≤ x ≤ b} a, b ∈ R, define A ⊂ S. Let L(A) represents length of interval A.


Define Ai ⊂ S as nonoverlapping intervals on S, i.e. A1 ∪ A2 ∪ A3 ∪ ... = S. Then, P (A) = L(A)
L(S)
satisfies axioms for a probability function, as follows:

L(A)
1. Since length of any interval is nonnegative, we have P (A) = L(S)
≥ 0, hence axiom 1 is
satisfied
L(A1 )+L(A2 )+L(A3 )+... L(S)
2. P (S) = L(S)
= L(S)
= 1, i.e. axiom 2 is satisfied

3. P (A1 ∪ A2 ∪ A3 ∪ ...) = L(A1 )+L(A 2 )+L(A3 )+...


L(S)
= L(A1 )
L(S)
+ L(A
L(S)
2)
+ L(A
L(S)
3)
+ ... = P (A1 ) + P (A2 ) +
P (A3 ) + ..., so axiom 3 is also satisfied.

Since the length of any point is zero, this rule also gives P (A) = 0 if A = {x}, x ∈ S is a
single-element event.

The probability definitions given on uncountable sample space on real line also apply to the
sample spaces on a geometrical region or a region on a Cartesian plane. In that case, probabilities
are defined on measurements such as area or volume of the region, i.e P (A) = area of A
area of S
or
volume of A
P (A) = volume of S .

Examples
(a) John is a 2-year-old boy. From his family history, let’s assume that his adult height is equally
likely to lie between 169 cm and 174 cm. What is the probability that a) he will be at least 172
cm tall as an adult; b) he will have a height between 170 cm and 171 cm?

Solution: We have sample space S = {x : 169 ≤ x ≤ 174}. Define A = {x : 172 ≤ x ≤ 174}


and B = {x : 170 ≤ x ≤ 171}. Then L(S) = 5, L(A) = 2, and L(B) = 1, ∴ P (A) = 52 , and
P (B) = 15 .

(b) A point is chosen at random inside a rectangle measuring 3 by 5 m. Find the probability
that the point is at least 1 m from the edge.

Solution: Let S denote the set of points inside the rectangle and A the points that are at least
1 m from the edge. Then, A is a 1 by 3 m rectangle. Hence, P (A) = area of A
area of S
= 1×3
3×5
= 15 .

Session 7 Exercises

16
STA211: Foundations of Probability and Statistics 16
2.4 Unit 2 Summary 2 UNIT 2: PROBABILITY

1. Suppose you daily ride a commuter train from your home in Blantyre to your work place in
Lilongwe. The station you leave from has trains leaving for Lilongwe at 07:00am, 07:13am,
07:20am, 07:25am, 07:32am, 07:45am and 07:55am. It is your practice to take the first
train that leaves after your arrival at the station. Suppose you are equally likely to arrive
at the station any instant between 07:15am and 07:45am. On a particular day, what is the
probability that you have to wait less than 5 minutes at the station? Suppose the 07:25am
and 07:45am trains are express, what is the probability that you catch an express on a
given day?

2. A 15-cm ruler is broken into 2 pieces at a random point along its length. What is the
probability that the longer piece is at least twice the length of the shorter piece?

2.4 Unit 2 Summary

In this unit, you have studied

1. Specifying probability distributions based on events on sample space

2. Proving probability theorems based on axioms of probability

3. Evaluate problems using counting techniques.

2.5 Unit 2 References


1. Lipschutz, S. and Lipson, M. Probability, 2nd ed. Schaum’s Outlines.

2. Larson, H.J. (1982). Introduction to probability theory and statistical inference, 3rd ed.
New York: John Wiley and Sons.

3. Panik, M.J. (2005). Advanced statistics from an elementary point of view. Amsterdam:
Elsevier.

17
STA211: Foundations of Probability and Statistics 17
3 UNIT 3: CONDITIONAL PROBABILITY AND INDEPENDENCE

3 Unit 3: Conditional Probability and Independence

This unit presents techniques for computing probability problems of events that depend or do
not depend on each other.

3.1 Session 8: Review of Counting techniques

Introduction
In this session, a review of methods for counting number of elements for events in a finite sample
space is given.

Objectives of Session 8
After studying this session, you should be able to:

ˆ Apply multiplication rule to solve problems involving finite sample spaces

ˆ Use permutation to solve problems related to number of elements in finite sample space

3.1.1 Multiplication principle

Definition: If an operation can be performed in n1 ways and immediate second operation in n2


ways, then both operations can be performed in n1 × n2 ways.

Examples
(a) If a minibus can travel from Blantyre to Lilongwe in 3 ways and from Lilongwe to Mzuzu in
4 ways. Then, the minibus can travel from Blantyre to Mzuzu in a total of 3 × 4 = 12 ways.

(b) If the operation of tossing a die gives rise to 1 to 6 possible outcomes and the second die
to also 1 to 6 possible outcomes, then the operation of tossing a pair of dice will give rise to
6 × 6 = 36 possible outcomes.

(c) Suppose that a set A has n1 elements and a second set B has n2 elements. Then the Cartesian
product A × B has n1 × n2 elements. The Cartesian product A × A has n21 elements, B × B has
n22 elements.

This definition applies to any number of operations. For instance, tossing three coins can give
rise to 23 = 8 possible outcomes. Let set Ai have ni elements, i = 1, 2, ..., k, then the Cartesian
product A1 × A2 × ... × Ak has n1 × n2 × ... × nk different elements. The tree diagram is often
used to graphically express the multiplication rule (check MAT100 notes).

3.1.2 Permutation

Definition: An arrangement of n symbols in a definite order is called a permutation of the n


symbols (taken all at a time), and it is denoted by P (n, n) or nPn . An arrangement of any r ≤ n

18
STA211: Foundations of Probability and Statistics 18
3.1 Session 8: Review
3 ofUNIT
Counting
3: CONDITIONAL
techniques PROBABILITY AND INDEPENDENCE

of these objects in a given order is called an r permutation of the n symbols taken r at a time,
and it is denoted by P (n, r) or nPr .

For P (n, n), the total number of ways of performing n operations is given by the product of
number of ways of doing individual operations in the order of their performance, with first being
done in n ways, the second in n − 1 ways, up to the last in 1 way. Hence, total number of
operations is n × (n − 1) × (n − 2) × ... × 2 × 1, also written as n! (read as n-factorial). Thus
P (n, n) = n × (n − 1) × (n − 2) × ... × 2 × 1 = n!.

With P (n, r), the first operation can also be performed in n ways, the second in (n − 1) ways,
so that by the time an r-th operation is undertaken (r − 1) symbols have already be used
and any of the remaining n − (r − 1) symbols could be candidate for r-th position. Hence,
P (n, r) = n × (n − 1) × (n − 2) × ... × (n − (r − 1)).

Since the very last operation is r-th operation, then multiplying P (n, r) with (n−r)! gives P (n, n).
(n−r)! n!
Hence, by algebraic trick, we have P (n, r) = n×(n−1)×(n−2)×...×(n−(r−1))× (n−r)! = (n−r)! .

Examples of Permutation
(a) Suppose that the same 5 people park their cars on the same side of the street in the same block
every night. How many different orderings of the 5 cars parked on the street are possible? (b)
How many four-letter ”words” can we make using the letters w,i,n,t,e,r (allowing no repetition)?

Solution: (a) The first parking slot can be filled in 5 different cars, the second by 4 different
cars, an so on. Hence, we have 5.4.3.2.1 = 5! = 120 total numder of different orderings without
repeating an ordering.
(b) Assume any four of the given letters form a ”word”, not necessarily a dictionary word, then
6!
we have P (6, 4) = 6 × 5 × 4 × 3 = 360 or P (6, 4) = (6−4)! = 6×5×4×3×2!
2!
= 360.

3.1.3 Permutation with repetitions

The definition and examples of permutation we have had involve scenarios in which symbols
being arranged are not repeated at subsequent stages. But permutation may also apply to cases
where at each stage of operation, there is repetition of symbols to use.

Definition: The total number of arranging n symbols in a definite order or (permutation of n


symbols), in which n1 are alike, n2 are alike, ..., nr are alike, denoted by P (n; n1 , n2 , ..., nr ) =
n!
n1 !n2 !...nr !
.

Example
(a) Find the number of seven-letter words that can be formed using the letters of the word
”BENZENE”. (b) Find the number of different signals, each consisting of eight flags in a vertical
line, that can be formed from four identical red flags, three identical white flags, and a blue flag.
c How may three-letter words can we make using letters w,i,n,t,e,r, with one or more repeated
letters.

Solution: (a) This the case of P (7; 3, 2), since there are 7 letters, of which 3 are E’s and 2 are
7!
N’s. Hence P (7; 3, 2) = 3!2! = 7.6.5.4.3.2.1
3.2.1.2.1
= 420. (b) This is the case of P (8; 4, 3), since the signal
8!
is to be made from 8 flags where 4 are red and 3 are white. Hence P (8; 4, 3) = 4!3! = 280. (c)

19
STA211: Foundations of Probability and Statistics 19
3.2 Session 9: Ordered
3 samples,
UNIT 3: Combinations
CONDITIONAL PROBABILITY AND INDEPENDENCE

There will be 6 × 6 × 6 = 216 three we can make from the given 6 letters if repetition is allowed.
Now, if repetition is not allowed, there will be P (6, 3) = 6.5.4 = 120 three-letter words we can
make. Now, we will have 216 − 120 = 96 three-letter words with one or more repeated letters.

Session 8 Exercises

1. 6 people are about to enter a cave in a single file. In how many ways could they arrange
themselves in a row to go through the entrance?

2. A statistics class contains 8 male and 6 female students. Find the number of ways that the
class can elect: (a) 2 class representatives, 1 male and 1 female; (b) 1 class representative
and 1 vice-class representative

3. There are 5 bus lines from city A to B and 4 bus lines from city B to C. Find the number
of ways a person can travel: (a) from A to C by way of B; (b) round-trip from A to C by
way of B; (c) round-trip from A to C by way of B, without using a bus line more than
once

4. Suppose there are 12 married couples at a party. Find the number of ways of choosing a
man and a woman from the party such that the two are: (a) married to each other; (b) not
married to each other

5. Suppose a password consists of 4 characters, the first 2 being letters in the alphabet and
the last 2 being digits. Find the number of: (a) passwords that can be generated; (b)
passwords beginning with a vowel

3.2 Session 9: Ordered samples, Combinations

Introduction
In this session, further discussion of counting techniques is given.

Objectives of Session 9
After studying this session, you should be able to:

ˆ Apply permutation on sets and events

ˆ Use combinations to solve problems related to number of elements in finite sample space

3.2.1 Ordered samples

The cases of permutation without and with repetitions discussed in previous session are analogous
to studying number of ways of selecting an element from a set S containing n elements, one
after another until the required number or type is exhausted, without replacement and with
replacement of preceding chosen elements.

This is called sampling without replacement if the element is not replaced in the set S before
the next is chosen, and sampling with replacement if the element is replaced in S before the

20
STA211: Foundations of Probability and Statistics 20
3.2 Session 9: Ordered
3 samples,
UNIT 3: Combinations
CONDITIONAL PROBABILITY AND INDEPENDENCE

n!
next element is chosen. Therefore, we use the same permutation formulas P (n, r) = (n−r)! to
get different ordered samples of size r from a set with n elements, without replacement. While
n × n × n × ... × n = nr is the total number of different ordered samples of size r that can be
selected from a set with n elements, with replacement.

Examples
(a) Three cards are chosen in succession from a deck of 52 cards. Find the number of ways this
can be done (i) with replacement, (ii) without replacement. (b) An urn contains 8 balls. In how
many ways can you choose 3 balls from the urn (i) with replacement (ii) without replacement.

Solution: (a − i) The 3 cards can be selected in a total of 52 × 52 × 52 = 523 = 140, 608 different
ways, with replacement; (a − ii) The 3 cards can be chosen in P (52, 3) = 52 × 51 × 50 = 132, 600
different ways, without replacement. (b − i) The 3 balls can be chosen in 8 × 8 × 8 = 83 = 512
different ways, with replacement; (b − ii) the 3 balls can be chosen in P (8, 3) = 8 × 7 × 6 = 336
different ways, without replacement.

3.2.2 Combinations

Definition: The number of distinct subsets, each of size r, that can be constructed from a set
S with n elements
  is called the number of combinations of n things taken r at a time, and it
n
is denoted by or C(n, r). The key thing is that order of arranging objects doesn’t count
r
in combinations, unlike in permutations, that’s r different symbols can be used to construct r!
different r-tuples. Hence, C(n, r) = P (n,r)
r!
n!
= r!(n−r)! .

Examples of combinations
(a) Let S = {1, 2, 3, 4}. Find the number of combinations of elements of S taken two at a time.
(b) Find the number of committees of 3 people that can be formed from 8 people. (c) A farmer
buys 3 cows, 2 pigs, and 4 hens from a person who has 6 cows, 5 pigs, and 8 hens. How many
choices does the farmer has?

Solution: (a) The actual combinations are: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, {3, 4}, Hence,
4!
in total we have C(4, 2) = 2!(4−2)! = 6. (b) Each committee is a combination of 8 people taken
3 at a time. Therefore, we have C(8, 3) = 8.7.6
3.2.1
=56 committees. (c) The cows can be chosen in
C(6, 3) = 3.2.1 = 20 ways, the pigs in C(5, 2) = 5.4
6.5.4
2.1
= 10 ways, and hens in C(8, 4) = 8.7.6.5
4.3.2.1
= 70
ways. In total, the choices can be made in C(6, 3) × C(5, 2) × C(8, 4) = 20.10.70 = 14, 000 ways.

3.2.3 Counting and Partitions

Recall the definition of partition of a set in Unit 1. Let an urn A contains 7 marbles that are
numbered 1 through 7. If the interest is to compute the number of ways we can draw first 2
marbles, next 3 marbles, and last 2 marbles from the urn, then we want the number of ordered
partitions of A into the celss A1 , A2 , A3 , where A1 contains 2 marbles, A2 3 marbles and A3 2
marbles.
   
7 5
Hence, there are ways of drawing the first 2 marbles, i.e. determining cell one;
2 3

21
STA211: Foundations of Probability and Statistics 21
3.2 Session 9: Ordered
3 samples,
UNIT 3: Combinations
CONDITIONAL PROBABILITY AND INDEPENDENCE

 
2
ways of drawing next 3 marbles; and ways of drawing the last 2 marbles. In total, we have
      2
7 5 2
. . = 210 different ordered partitions of A into cells A1 , A2 , A3 . In short, n1 objects
2 3 2    
n n − n1
can be allocated in ways in cell A1 , n2 objects in ways into cell A2 etc and last
n1  n2 
n − n1 − n2 − ... − nr−1
nr objects can be allocated into cell Ar in different ways.
nr

This
 gives  the total
 number
 of permutations
 as 
n n n − n1 n − n1 − n2 − ... − nr−1
= . ... = n1 !n2n!!...nr ! = Qr n! ni ! .
n1 n2 ...nr n1 n2 nr i=1

 
n
In statistics, the quantity is called a binomial coefficient as it ralates to expansion of
 r 
n
binary functions, while is called multinomial coefficient. You will encounter problems
n1 n2 ...nr
relating to binomial and multinomial functions later in this course.

Example
There are 12 balls in an urn. In how many ways can 3 balls be drawn from the urn 4 times in
succession all without replacement.

Solution:
  Let’s partition the urn into 4 cells, each containing 3 balls. Then number of ways is
12 12!
= 3!3!3!3! = 369, 600 different ways.
3333

3.2.4 Tree diagrams

A tree diagram is a device used to enumerate all possible outcomes of a sequence of experiments
or events, where each event can occur in a finite number of ways.

Examples
John (J) and Grey (G) are to play a tennis tournament. The first person to win 2 games in a row
or who wins a total of 3 games wins the torunament. Find the number of ways the tournament
can occur.

Solution: The tree diagram for possibles outcomes is given below. There are 10 endpoints which
correspond to the following 10 ways that the tournament can occur: JJ, JGJJ, JGJGJ, JGJGG,
JGG, GJJ, GJGJJ, GJGJG, GJGG, GG

22
STA211: Foundations of Probability and Statistics 22
3.3 Session 10: Conditional
3 UNIT probability
3: CONDITIONAL PROBABILITY AND INDEPENDENCE

G J

G J G J

G J G J

G J G J

G J G J

Session 9 Exercises

1. Three light bulbs are chosen at random from 15 bulbs of which 5 are defective. Find number
of ways of selecting the bulbs such that (i) none is defective, (ii) exactly one is defective
2. A family has 3 boys and 2 girls. (a) Find the number of ways they can sit in a row. (b)
How many ways are there if the boys and girls are to sit together?
3. A student is to answer 8 out of 10 questions in an exam. (a) Find the number of ways the
student can choose the 8 questions. (b) Find the number of ways this can happen, if the
student must answer the first three questions
4. Find the number of committees of 5 with a given chairperson that can be selected from 12
persons
5. Jane has time to play betting game at most 5 times. At each play she wins or loses K1000.
She begins with K1000 and will stop playing before 5 plays if she loses all her money. (a)
Find the number of ways the betting can occur. (b) How many cases will she stop before
playing 5 times. (c) How many cases will she leave without any money?

3.3 Session 10: Conditional probability

Introduction
In this session, relative probability of an event occuring conditional on another event is discussed.

Objectives of Session 10
After studying this session, you should be able to:

ˆ Compute conditional probability


ˆ Apply multiplication rule to calculate probability problems

3.3.1 Definition and computation of conditional probability

Let A and B be any two events on some sample space S. If event B has occured, it must have
occured either with event A, i.e. A ∩ B has occured or it must have occured without event A

23
STA211: Foundations of Probability and Statistics 23
3.3 Session 10: Conditional
3 UNIT probability
3: CONDITIONAL PROBABILITY AND INDEPENDENCE

occuring, i.e. A{ ∩ B. Then, the probability of A occuring conditional on B occuring, which we


denote by P (A|B) is essentially the proportion of time A occurs with B relative to the proportion
P (A∩B)
of time that B occurs (with or without A). That is, P (A|B) = P (A∩B)+P (A{ ∩B)
. Since, A ∩ B
and A{ ∩ B are disjoint, then (A ∩ B) ∪ (A{ ∩ B) = B.

Definition
Suppose A and B be any two events on some sample space S, with P (B) > 0. The probability
that event A occurs once B has occured or, the conditional probability of A given B, written
P (A|B) is
P (A ∩ B)
P (A|B) = (1)
P (B)

Example of conditional probability


(a) A pair of fair dice is rolled once. Given that the 2 numbers that occur are not the same,
compute the probability that the sum is 7. (b) Suppose a couple has two children. Find the
probability that both are boys if it is known that (i) at least one of the children is a boy, (ii) the
older child is a boy.

Solution:
(a) Let A be the event that two numbers that occur are different, i.e.
A = {(1, 2), (2, 1), (1, 3), (3, 1), (1, 4), (4, 1), (1, 5), (5, 1), (1, 6), (6, 1), (2, 3), (3, 2), (2, 4), (4, 2), (2, 5), (5, 2),
(2, 6), (6, 2), (3, 4), (4, 3), (3, 5), (5, 3), (3, 6), (6, 3), (4, 5), (5, 4), (4, 6), (6, 4), (5, 6), (6, 5)}
While B be the event that the sum is 7, i.e. B = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}. Here,
S = {(1, 1), (1, 2), (2, 1), ..., (6, 6)}, n(S) = 36 possible outcomes. Assume equally likely single-
element events, each with probability 1/36. Then P (A) = n(A) n(S)
= 30
36
= 56 , P (B) = n(B)
n(S)
6
= 36 = 16 ,
n(A∩B) 6 P (A∩B) 1/6
P (A ∩ B) = n(S)
= 36
= 61 . Hence, P (B|A) = P (A)
= 5/6
= 15 .

(b) Denote B=Boy, G=Girl child and assume equiprobable space S = {BB, BG, GB, GG}, each
with probability 1/4. Let A = {BB, BG, GB} be event that at least one of the children is a boy,
D = {BB} be event that both children are boys, C = {BB, BG} be event that older child is a
boy. (i) P (D|A) = P P(A∩D)
(A)
= 1/4
3/4
= 31 ; (ii) P (D|C) = P P(D∩C)
(C)
1/4
= 2/4 = 12 .

3.3.2 Multiplication theorem for conditional probability

Conditional probability can help us assign probabilities to intersection of events. Since P (A|B) =
P (A∩B)
P (B)
and that A ∩ B = B ∩ A, it must be true that

P (A ∩ B) = P (B)P (A|B) = P (A)P (B|A). (2)

The above result is called multiplication theorem for conditional probability, which gives a formula
for computing probability that events A and B both occur.

By mathematical induction, the definition of conditional probability can be easily extended to


more than two events. That is, P (A|B ∩ C) = P P(A∩B∩C)
(B∩C)
. This implies

P (A ∩ B ∩ C) = P (A|B ∩ C)P (B ∩ C). (3)

24
STA211: Foundations of Probability and Statistics 24
3.3 Session 10: Conditional
3 UNIT probability
3: CONDITIONAL PROBABILITY AND INDEPENDENCE

Using the fact that P (B ∩ C) = P (B|C)P (C) and commutative law for intersection, the above
result simplifies to

P (A∩B∩C) = P (C)P (B|C)P (A|B∩C) = P (A)P (B|A)P (C|A∩B) = P (B)P (A|B)P (C|A∩B).
(4)

Example
An urn contains 4 white balls and 8 black balls. If 2 balls are selected at random without
replacement, find the probability that (a) both are white balls, (b) the second ball is white.

Solution
(a) Let A be the event that first ball drawn is white and B the event that second ball is white,
4 3 1
and A ∩ B the event that both are white. Then P (A ∩ B) = P (A)P (B|A) = 12 . 11 = 11
(b)

B = (A ∩ B) ∪ (A{ ∩ B)
P (B) = P (A ∩ B) + P (A{ ∩ B)
= P (A)P (B|A) + P (A{ )P (B|A{ )
(5)
4 3 8 4
= . + .
12 11 12 11
1
= .
3

Session 10 Exercises

1. A fair coin is flipped 4 times. What is the probability that the fourth flip is a head, given
that each of the 3 flips resulted in heads?

2. Urn 1 contains 2 red and 4 blue balls, urn 2 contains 10 red and 2 blue balls. If an urn is
chosen at random and a ball is removed from the chosen urn, what is the probability that
the selected ball is blue?

3. A lot contains 12 items of which 4 are defective. Three items are drawn at random from
the lot one after the other. Find the probability that all 3 are nondefective.

4. In a certain college, 25% of the students failed mathematics, 15% failed chemistry, and
10% failed both mathematics and chemistry. A student is selected at random. (a) If the
student failed chemistry, what is the probability that he or she failed mathematics? (b)
What is the probability that the student failed mathematics or chemistry? (c) What is the
probability that the student failed neither mathematics nor chemistry?

5. Let A and B be events with P (A) = 0.6, P (B) = 0.3, and P (A ∩ B) = 0.2. Find (a)
P (A|B) and P (B|A), (b) P (A ∪ B), (c) P (A{ ), (d) P (A{ |B { ).

6. A box contains 7 red marbles and 3 white marbles. Three marbles are drawn from the box
one after the other. Find the probability that the first 2 are red and the third is white.

7. Students in a class are selected at random, one after the other, for an examination. Find
the probability that the men and women in the class alternate if: (a) the class consists of
4 men and 3 women, (b) the class consists of 3 men and 3 women

25
STA211: Foundations of Probability and Statistics 25
3.4 Session 11: Independent
3 UNITevents
3: CONDITIONAL PROBABILITY AND INDEPENDENCE

8. A box contains 3 red marbles and 7 white marbles. A marble is drawn from the box and
the marble is replaced by a marble of the other colour. A second marble is drawn from the
box. (a) Find the probability that the second marble is red. (b) If both marbles were of
the same colour, find the probability that they both were white.

3.4 Session 11: Independent events

Introduction
In this session, the multiplication theorem for conditional probability is applied to define proba-
bility of independent events.

Objectives of Session 11
After studying this session, you should be able to:

ˆ Calculate probability of two or more independent events


ˆ Apply concepts of independent events to solve probabilities in real life situations

3.4.1 Independence of events

Events A and B in a probability space S are said to be independent if the occurence of one of
them does not influence the occurence of the other. In other words, A is independent of B if
P (A) is the same as P (A|B). Now, substititute P (B) for P (B|A) in the multiplication theorem
for P (A ∩ B), we have
P (A ∩ B) = P (A)P (B|A) = P (A)P (B) (6)

Definition
Two events A and B are said to be independent of each other iff the following three conditions
hold:
P (A|B) = P (A)
P (B|A) = P (B), and (7)
P (A ∩ B) = P (A)P (B),
otherwise the two events are said to be dependent.

Note: Independence of events do not imply disjoint (mutually exclusive) events or viceversa,
unless one of the events is a null event.

Example for independent events


The probability that a man will live 10 more years is 1/4, and the probability that his wife will
live 10 more years is 2/5. Find the probability that: (a) both will live 10 more years; (b) at least
one of them will live 10 more years.

Solution
Let A be the event that the man lives 10 more years, and B the wife lives 10 more years.
Therefore, P (A) = 41 ; P (B) = 25 . Then (a) P (A ∩ B) = P (A).P (B) = 14 . 25 = 10
1
; and (b)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 14 + 25 − 10
1
= 11
20
.

26
STA211: Foundations of Probability and Statistics 26
3.4 Session 11: Independent
3 UNITevents
3: CONDITIONAL PROBABILITY AND INDEPENDENCE

3.4.2 Independence of three or more events or trials

Definition
Three events A, B, and C are independent if and only if:

1. P (A ∩ B)= P (A)P (B)


2. P (B ∩ C)= P (B)P (C)
3. P (A ∩ C)= P (A)P (C)
4. P (A ∩ B ∩ C)= P (A)P (B)P (C).

The above definition means that three events are said to be independent on when they both
pairwise independent and jointly independent. This defition can be extended by mathematical
inducation to any finite number of events, i.e. events A1 , A2 , ..., An are independent if any proper
subset of them is independent and P (A1 ∩ A2 ∩ ... ∩ An ) = P (A1 )P (A2 )...P (An ).

Example
A pair of coins is tossed yielding equiprobable space S = {HH, HT, T H, T T }. Consider the
events: A = {head on f irst toss} = {HH, HT }, B = {head on second toss} = {HH, T H},
and C = {head on exactly one coin} = {HT, T H}. Then P (A) = P (B) = P (C) = 24 = 12 .
Also, P (A ∩ B) = P ({HH}) = 14 , P (A ∩ C) = P ({HT }) = 14 , P (B ∩ C) = P ({T H}) = 14 .
This means that conidtions 1 to 3 are satisfied. Now, A ∩ B ∩ C = ∅, so that P (A ∩ B ∩ C) =
P (∅) = 0 6= P (A)P (B)P (C). Thus, condition 4 is not satisfied, and hence the three events are
not independent.

Definition
Let S be a finite probability space. The probability space of n independent or repeated trials,
denoted by Sn , consists of ordered n-tuples of elements of S with probability of an n-tuple defined
to be the product of the probability of its components, i.e. P (s1 , s2 , ..., sn ) = P (s1 )P (s2 )...P (sn ).

Example
Suppose that three horses a, b, c race together, their respective probabilities of winning are 1/2,
1/3, and 1/6. Suppose the horses race twice. Find the probability that horse c wins first race
and a wins the second race.

Solution
The first trial has S = {a, b, c}, with P (a) = 21 , P (b) = 31 , and P (c) = 16 . Upon repeating the trial,
we have S2 = {(a, a), (a, b), (a, c), (b, a), (b, b), (b, c), (c, a), (c, b), (c, c)}. Therefore P ((c, a)) =
1 1 1
. = 12
6 2
.

Session 11 Exercises

1. Let A and b be independent events with P (A) = 0.3 and P (B) = 0.4. Find (a) P (A ∩ B)
and P (A ∪ B), (b) P (A|B) and P (B|A), (c) P (A ∩ B { )
2. Box A contains 5 red marbles and 3 blue marbles and Box B contains 2 red marbles and
3 blue. Two marbles are drawn at random from each box. Find the probability that (a)
both are red, (b) they are all the same colour

27
STA211: Foundations of Probability and Statistics 27
3.5 Session 12: Partitions,
3 UNIT Total
3: Probability,
CONDITIONAL and Bayes’
PROBABILITY
Theorem AND INDEPENDENCE

3. Suppose A and B are independent events. Show that (a) A and A and B { are independent,
(b) A{ and B { are independent

4. Suppose the probability that Karonga United wins a home game in TNM Super League is
0.5, the probability that it loses at home is 0.3, while the probability that it draws at home
is 0.2. The team plays twice at home. Find the probability that it wins at least once.

3.5 Session 12: Partitions, Total Probability, and Bayes’ Theorem

Introduction
In this session, learners will be introduced to the law of total probability and Bayes’ theorem.

Objectives of Session 12
After studying this session, you should be able to:

ˆ Calculate probability involving disjoint events

ˆ Use Bayes’ rule to solve probabilities of some overlapping events

3.5.1 Partitions and Total Probability

The events E1 , E2 , ..., En are called a partition of the sample space S if Ei ∩ Ej = ∅ for all i 6= j
and E1 ∪E2 ∪...∪En = S. In other words, a partition cuts the whole samples space into mutually
exclusive pieces.

If A ⊂ S is any event and E1 , E2 , ..., En is a partition of S, then E1 , E2 , ..., En also partition A,


i.e.
A=A∩S
= A ∩ (E1 ∪ E2 ∪ ... ∪ En ) (8)
= (A ∩ E1 ) ∪ (A ∩ E2 ) ∪ ... ∪ (A ∩ En ).

and (A ∩ Ei ) ∩ (A ∩ Ej ) = ∅ for all i 6= j.

It follows that we can write

n
X
P (A) = P (A ∩ E1 ) + (A ∩ E2 ) + ... + (A ∩ En ) = P (A ∩ Ei ). (9)
i=1

This result is called the law (or theorem) of total probability.

Using multiplication theorem for conditional probability, the above theorem is simplified as
n
X
P (A) = P (E1 )P (A|E1 ) + P (E2 )(A|E2 ) + ... + P (En )(A|En ) = P (Ei )(A|Ei ). (10)
i=1

28
STA211: Foundations of Probability and Statistics 28
3.5 Session 12: Partitions,
3 UNIT Total
3: Probability,
CONDITIONAL and Bayes’
PROBABILITY
Theorem AND INDEPENDENCE

Example
A factory uses three machines X, Y , Z to produce lighting bulbs. Suppose machine X produces
50% of the bulbs, of which 3% are defective, machine Y produces 30% of the bulbs of which 4%
are defective, and machine Z produces 20% of the bulbs of which 5% are defective. Find the
probability that a randomly selected bulb is defective.

Solution
Let D denotes the event that a bulb is defective. Then, by the law of total probability,
P (D) = P (X)P (D|X) + P (Y )P (D|Y ) + P (Z)P (D|Z) = 0.50 × 0.03 + 0.30 × 0.04 + 0.20 × 0.05 =
0.037.

3.5.2 Bayes’ Theorem

Let E1 , E2 , ..., En be a partition of S. Then for any event A ⊂ S

P (Ei )P (A|Ei )
P (Ei |A) = Pn , i = 1, 2, ..., n. (11)
j=1 P (Ej )P (A|Ej )

Proof
i ∩A)
By definition P (Ei |A) = P (E
P (A)
. Since P (Ei ∩ A) = P (Ei )P (A|Ei ) and also by the law of total
probability we have P (A) = j=1 P (Ej ∩ A) = nj=1 P (Ej )P (A|Ej ), then the result (11) follows
Pn P
immediately. 

Example
Using the previous example, find the probability that a given defective bulb found was produced
by machine Y.

Solution
P (Y )P (D|Y ) 0.30×0.04 10
We want P (Y |D). Using Bayes’ rule, we have P (Y |D) = P (D)
= 0.037
= 37
.

Session 12 Exercises

1. Suppose 40% of residents of Mzuzu consider themselves as MCP supporters, 35% consider
themselves as UTM supporters, and 25% consider themselves as DPP supporters. During
the 2020 presidential election, 45% of MCP supporters voted, 40% of UTM supporters
voted, and 60% of DPP supporters voted. Suppose a person is randomly selected, (a) find
the probability that the person voted; (b) if the person voted, find the probability that the
voter was UTM supporter.

2. In a certain college, 4% of the men and 1% of the women are taller than 6 feet. Furthermore,
60% of the students are women. Suppose a randomly selected student is taller than 6 feet,
find the probaility that the student is a woman.

3. Suppose that medical science has a cancer-diagnostic test that is 95% accurate on both
those who do and those who do not have the cancer. If 0.005 of the population does have
cancer, compute the probability that a particular individual has cancer, given that the test
says he has cancer.

29
STA211: Foundations of Probability and Statistics 29
3.6 Unit 3 Summary3 UNIT 3: CONDITIONAL PROBABILITY AND INDEPENDENCE

3.6 Unit 3 Summary

In this unit, you have studied

1. Calculation of conditional probability

2. Computation of probability of independent events

3. Application of Bayes’ rule to solve probability problems.

3.7 Unit 3 References


1. Lipschutz, S. and Lipson, M. Probability, 2nd ed. Schaum’s Outlines.

2. Larson, H.J. (1982). Introduction to probability theory and statistical inference, 3rd ed.
New York: John Wiley and Sons.

3.

4. Panik, M.J. (2005). Advanced statistics from an elementary point of view. Amsterdam:
Elsevier.

30
STA211: Foundations of Probability and Statistics 30
4 UNIT 4: RANDOM VARIABLES AND THEIR PROPERTIES

4 Unit 4: Random variables and their properties

This unit presents the concept of random variable and introduces learners to probability distri-
bution of a random variable.

4.1 Session 13: Random variable and its probability distribution

Introduction
In this session, the concept of random variable is defined. Further, the concept of probability of
a random variable is presented.

Objectives of Session 13
After studying this session, you should be able to:

ˆ Define a random variable and its values given a sample space

ˆ Calculate probability of value of random variable of discrete type

ˆ Specify probability distribution of a discrete random variable

4.1.1 Random variable

Definition
Let S be a sample space. A random variable X is a function whose domain is an event on S and
whose range is a real number.

Recall that an outcome of a random experiment, that generates members of S, cannot be pre-
dicted in advance but depends on chance factors (or it is random). So, the value of X, which
we denote using small letter x, varies from trial to trial. This is the reason X is referred to as
random variable.

Examples of random variable


(a) A pair of fair dice is rolled 1 time. Let X be the sum of 2 numbers that occur. Find the
possible values of X.

Solution
Here S = {(x1 , x2 ) : x1 = 1, 2, ..., 6; x2 = 1, 2, ..., 6}, event ω = (x1 , x2 ) ∈ S. Hence random
variable X(ω) = x1 + x2 . This gives the range of X as RX = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.

(b) A couple wants to stop bearing children after four live births. Define a random variable X
as total number of female children this couple may have. Specify the range of values of X.

Solution
The sample space S will have 16 equally likely events, i.e. S = {BBBB, BBBG, BGGB, ..., BGGG, GGGG
where B=Boy and G=Girl. Hence, X = {0, 1, 2, 3, 4}.

31
STA211: Foundations of Probability and Statistics 31
4.1 Session 13: Random variable
4 UNIT and
4: its
RANDOM
probability
VARIABLES
distributionAND THEIR PROPERTIES

A random variables can be discrete or continuous, depending on its range. The distinction
between the two types of random variables is analogous to the cases of discrete and continuous
sample spaces. We will first deal with discrete random variables and later work with continuous
random variables.

Definition
A random variable X is called discrete if its range, RX , is a discrete set. Otherwise X is said to
be continuous.

In other words, X is discrete if the number of values it can assume forms a countable set. On the
other hand, X is continuous if it can assume an infinite or uncountable number of values over
some interval. The two examples we have had above are both cases of discrete random variable.

Here are some properties of random variables. If X and Y are random variables on the same
sample space S and ω any point in S, then X + Y , X + k, kX, and XY are the functions on S
defined as follows:

1. (X + Y )(ω) = X(ω) + Y (ω)

2. (X + k)(ω) = X(ω) + k

3. (kX)(ω) = kX(ω)

4. (XY )(ω) = X(ω)Y (ω)

5. In general, [h(X)](ω) = h[X(ω)].

4.1.2 Probability function of a random variable

Suppose X is a finite random variable on a sample space S, with an image set X(ω) = {x1 , x2 , ..., xn }.
X(ω) can be made into a probability space by defining the probabilities of xi as P (X = xi ), which
is sometimes written as f (xi ). The function f on X(ω) is called the probability function or dis-
tribution of X.

The set of ordered pairs [xi , f (xi )] can be given in a form of a table as follows:

Table 1: Probability distribution of a discrete random variable X

xi x1 x2 ... xn
f (xi ) f (x1 ) f (x2 ) ... f (xn )

The probability distribution f of the random variable X satisfies the following two conditions:

1. f (xi ) ≥ 0
Pn
2. i=1 f (xi ) = 1

32
STA211: Foundations of Probability and Statistics 32
4.2 Session 14: Distribution
4 functions
UNIT 4: RANDOM
and densityVARIABLES
functions AND THEIR PROPERTIES

Example of probability distribution


Let’s take a case of previous example in which two fair dice are rolled one time, and the random
variable X is the sum of two numbers that occur. Check if the values of X form a probability
distribution.

Solution
Sample space is finite with 36 ordered pairs that are equally likely to occur, i.e. S = {(1, 1), (1, 2), (1, 3), (1, 4
Now, X[(x1 , x2 )] = x1 + x2 , (x1 , x2 ) ∈ S. The range of X is RX = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}.
1
Only one event in S, gives the value of X as 2, i.e. (1, 1), hence P (X = 2) = 36 . The value 3
2
can be obtained from 2 events, i.e. (1, 2) and (2, 1), hence P (X = 3) = 36 . Continuing this way
we have the distribution of X as follows:

Table 2: Distribution of values of a random variable X

x 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
P (X = x) 36 36 36 36 36 36 36 36 36 36 36

From the given distribution in Table 2, we note that for each x ∈ X, P (x) ≥ 0 and
1 2 3 4 5 6 5 4 3 2 1
P
P (x) = 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 + 36 = 1. Hence, the two conditions for
a probability distribution are satisfied, implying that the distribution of X forms a probability
function.

Session 13 Exercises

1. An urn contains 4 balls numbered 1, 2, 3, 4, respectively. Two balls are drawn from the
urn at random without replacement. Let Z be the sum of the two numbers that occur.
Derive the probability function for Z.

2. Two fair dice are rolled one time and let M be the maximum of the two numbers that face
up. Derive the probability function for M .

3. A class in statistics contains 10 students, 3 of whom are 19, 4 are 20, 1 is 21, 1 is 24, and
1 is 26. Two students are selected at random without replacement from this class. Let X
be the average age of the two selected students and drive the probability function for X.

4. A fair coin is tossed 4 times. Let X denote the number of heads occuring. Derive the
probability distribution of X.

4.2 Session 14: Distribution functions and density functions

Introduction
This session introduces learners to probability distribution function and density function of a
random variable.

Objectives of Session 14
After studying this session, you should be able to:

33
STA211: Foundations of Probability and Statistics 33
4.2 Session 14: Distribution
4 functions
UNIT 4: RANDOM
and densityVARIABLES
functions AND THEIR PROPERTIES

ˆ Compute cumulative probability of any random variable

ˆ Derive density function from distribution function of a continuous random variable

4.2.1 Distribution function and density function

Definition: The distribution function (or cumulative distribution function, cdf) for a random
variable X, denoted by FX (t) gives the value of P (X ≤ t) for any real t, i.e. FX (t) = P (X ≤ t)
for −∞ < t < ∞.

If X is discrete, the cdf of X is evaluated by simply summing


P up probabilities of X for values of
X that are no bigger than t, i.e. FX (t) = P (X ≤ t) = x≤t P (x). Since a continuous random
variable is a real-valued function defined on interval scale, its probability function is described
in the form of a distribution function FX (t), and not as probabilities of single-element events.
FX (t) being a probability, it must satisfy the condition that 0 ≤ FX (t) ≤ 1 for any t.

Definition: Let X be a continuous random variable with distribution function FX (t). The
density function (or probability density function, pdf) for X is fX (t) = dtd FX (t).
Rt
If X is a continuous random variable, it can be shown that P (X ≤ t) = FX (t) = −∞ fX (t)dt
Rb
and P (a ≤ X ≤ b) = FX (b) − FX (a) = a fX (t)dt. In other words, the cdf between any two
values a and b of a continuous random variable X is the area under fX (t) between a and b.

Examples of distribution and density functions


a) A fair coin is tossed three times in succession, with a success defined as getting heads on any
individual toss. If the random variable X depicts the number of heads obtained, calculate: (i)
the probability that X is less than 2; (ii) the probability that X is higher than 2.

Solution:
The sample sapce, S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }. The random vari-
able X has values, X = {0, 1, 2, 3}, with probability distribution:

Table 3: Distribution of values of a random variable X

x 0 1 2 3 sum
1 3 3 1
P (X = x) 8 8 8 8
1

Therefore (i) P (X < 2) = P (X = 0) + P (X = 1) = 81 + 83 = 12


(ii) P (X > 2) = 1 − P (X ≤ 2) = 1 − [P (X = 0) + P (X = 1) + P (X = 2)] = 1 − ( 81 + 3
8
+ 38 ) =
1 − 78 = 38 = P (X = 3).

(b) Given that Y is a continuous random variable with fY (y) = 2(1 − y) for 0 < y < 1, and 0
otherwise. Verify that Y has a probability distribution function. Compute P (Y > 0.6).

SolutionR
∞ R1
FY (y) = −∞ fY (y)dy = 0 2(1 − y)dy = (2y − y 2 )|10 = 1. Thus, 0 ≤ FY (y) ≤ 1. Hence, Y has a

34
STA211: Foundations of Probability and Statistics 34
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES

R 0.6
probability distribution. Now P (Y > 0.6) = 1−FY (0.6) = 1− 0
2(1−y)dy = 1−(2y −y 2 )|0.6
0 =
0.16.

Note: The probability distribution of a discrete random is also called probability mass function
because it shows the concentration (or mass or weighting) of each point in the range of the
random variable.

Session 14 Exercises

1. Let the random variable Y have probability mass function as in Table 4: Find P (Y < 8).

Table 4: Probability mass function of Y

y -2 5 8
g(y) 0.3 0.5 0.2

2. Let X be a continuous random variable with the probability density function f (x) = kx if
0 ≤ x ≤ 5 and 0 elsewhere. Find (a) k (b)P (1 ≤ X ≤ 3), (c)P (X > 2).

4.3 Session 15: Expectation and variance of a random variable

Introduction
This session introduces learners to expected value and variance of a random variable.

Objectives of Session 15
After studying this session, you should be able to:

ˆ Calcuate mean of a random variable

ˆ Compute variance of a random variable

4.3.1 Expectation of a random variable

Knowing a probability law of a random variable and computing its associated probabilities alone
may not be all that the analyst wants. In some cases, one may wish to know the centre of the
distribution of the random variable, the spread of values from this centre, or even shape of the
distribution. This calls for further analysis of the probability distribution of the random variable.

Definition: Let X be a random variable, whose probability mass function is p(x) or probability
density function is f (x), depending on whether X is discrete or continuous, respectively. Then,
the mean, or expectation (or expected value) of X, denoted by E(X) is:
n
X
E(X) = x1 p(x1 ) + x2 p(x2 ) + ... + xn p(xn ) = xi p(xi ), [X discrete] (12)
i=1

35
STA211: Foundations of Probability and Statistics 35
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES

or Z ∞
E(X) = xf (x)dx, [X continuous] (13)
−∞

Another notation for expected value of X is µX or simply µ.

Note that E(X) can be viewed as the weighted average of the values of X, where each value is
weighted by its probability, if X is discrete, or as the balance point of the density function, if X
is continuous.

Examples for expectation


(a) Let X be a random variable with the following probability distribution. Find E(X):

Table 5: Probability distribution of X

y 2 3 6 10
g(y) 0.2 0.2 0.5 0.1

(b) Let Y be a continuous random variable with probability density function f (y) = 2(1 − y) for
0 < y < 1 and 0 otherwise. Find the expected value of Y .

Solution P
(a) E(X) = x xp(x) = 2(0.2) + 3(0.2) + 6(0.5) + 10(0.1) = 0.4 + 0.6 + 3.0 + 1.0 = 5.
R∞ R1 R1
(b) E(Y ) = −∞ yf (y)dy = 0 y × 2(1 − y)dy = 0 (2y − 2y 2 )dy = (y 2 − 32 y 3 )|10 = 31 .

Theorem (Linearity properties of random variables): Let X and Y be random variables


on the same sample space S and c be any real number, then

1. E(c) = c

2. E(cX) = cE(X)

3. E(X + c) = E(X) + c

4. E(X + Y ) = E(X) + E(Y )

5. E( ni=1 Xi ) = ni=1 E(Xi )


P P

Proof
1) If X P
is discrete random variable,
E(c) = ni=1 c × p(xi ) = c × p(x1 ) + c × p(x2 ) + ... + c × p(xn ) = c × [p(x1 ) + p(x2 ) + ... + p(xn )] =
c × 1 = c. R R
If X is continuous, then E(c) = x cf (x)dx = c x f (x)dx = c × 1 = c.

36
STA211: Foundations of Probability and Statistics 36
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES

5) Using discrete case for X, we have


n
X
LHS = E( Xi )
i=1
XX n
= ( Xi )p(x)
i=1
X
= (X1 + X2 + ... + Xn )p(x)
X
= (X1 p(x) + X2 p(x) + ... + Xn p(x)) (14)
X X X
= X1 p(x) + X2 p(x) + ... + Xn p(x)
= E(X1 ) + E(X2 ) + ... + E(Xn )
Xn
= E(Xi )
i=1
= RHS.

The proofs for parts 2) to 4) are left for your practice.

4.3.2 Variance of a random variable

Definition: The variance of a random variable X, denoted by V ar(X) or σ 2 is the expected


value of the squared deviations of values of X from the expectation, i.e.:
n
X
2
V ar(X) = E[(X − µ) ] = (xi − µ)2 p(xi ), [X discrete] (15)
i=1

or Z ∞
2
V ar(X) = E[(X − µ) ] = (x − µ)2 f (x)dx, [X continuous]. (16)
−∞

If we expanded the squaring in the above definition, we will get an alternative formula for
calculating variance of X as follows:

V ar(X) = E[(X − µ)2 ]


= E(X 2 − 2µX + µ2 )
= E(X 2 − 2µE(x) + µ2 )
(17)
= E(X 2 ) − µ2
Xn
= x2i p(xi ) − µ2 , [X discrete],
i=1

or

37
STA211: Foundations of Probability and Statistics 37
4.3 Session 15: Expectation
4 and
UNIT
variance
4: RANDOM
of a random
VARIABLES
variable AND THEIR PROPERTIES

V ar(X) = E[(X − µ)2 ]


= E(X 2 − 2µX + µ2 )
= E(X 2 − 2µE(x) + µ2 )
(18)
= E(X 2 ) − µ2
Z ∞
= x2 f (x)dx − µ2 , [X continuous].
−∞

Examples of variance Pn 2 2
Using the data from the previous example, we have: (a) V ar(X) = i=1 xi p(xi ) − µ =
22 (0.2) + 32 (0.2) + 62 (0.5) + 102 (0.1) − 52
= 4(0.2) + 9(0.2) + 36(0.5) + 100(0.1) − 25 = 30.6 − 25 = 5.6.

R∞ R1 R1
(b) V ar(Y ) = −∞ y 2 f (y)dx − µ2 = 0 y 2 × 2(1 − y)dy − ( 31 )2 = 0 (2y 2 − 2y 3 )dy − 1
9
3 4
= [ 2y3 − y2 ]|10 − 19 = 0.56.

Definition: The standard deviation ofpa random variable X, denoted by σ is the positive square
root of the variance of X, i.e. σ = V ar(X). The standard deviation of X measures the
dispersion of values of X around the mean µ.

It is often practically meaningful to analyse the spread of the distribution of X around the
mean using
√ standard deviation compared to variance. For instance, in example (a) above,
σX = 5.6 = 2.37, which is interpreted as values of X√are, on average, at a distance of 2.37
away from the mean of X. While for example (b), σY = 0.56 = 0.75, implying values of Y are
on average at a distance of 0.24 away from the mean of Y .

Properties of variance

1. V ar(c) = 0, (for any real number c)

2. V ar(cX) = c2 V ar(X)

3. the variance of a random variable need not always exist

4. if E(X 2 ) exists, E(X) exists and thus V ar(X) exists. Hence, the existence of V ar(X)
implies that E(X) exists.

Given a random variable X, then a standardised form of X is given by Z = X−µ


σ
. It follows that
E(Z) = E[ σ ] = σ E(X − µ) = σ [E(X) − E(µ)] = σ [µ − µ] = 0. While V ar(Z) = V ar[ X−µ
X−µ 1 1 1
σ
]=
1 1 1 2
σ2
V ar(X − µ) = σ2
[V ar(X) − V ar(µ)] = σ2
[σ − 0] = 1.

Session 15 Exercises

1. Let the probability distribution of a random variable X be given by {(1, 41 ), (3, 12 ), (9, 14 )}.
Find (a) E(X 2 − 1), (b) Var(X)

38
STA211: Foundations of Probability and Statistics 38
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES

2. Suppose Y has the pdf f (y) = 1/5 for 5 < y < 10 and 0 elsewhere. Find (a) E(X), (b)
E(X 2 ), (c) V ar(X)
3. Prove that, for any random variable X and real numbers a,b, V ar(a + bX) = b2 V ar(X).
 
n
Pn n n−i i
4. Evaluate the expression E[(aX + b) ] = i=0 a b E(X n−i ) for n=1,2,3
i
5. Two cards are selected from a box that contains five cards numbered 1, 1, 2, 2, 3. Let Y
denotes the sum of the two numbers drawn. Find: (i) the probability distribution of Y ;
(ii) mean of Y , E(Y ); (iii) the variance of Y , V ar(Y ) and standard deviation of Y , σY .

4.4 Session 16: Bivariate probability distributions

Introduction
In this session, bivariate probability distribution is discussed.

Objectives of Session 16
After studying this session, you should be able to:

ˆ Compute probability problems involving bivariate or jointly distributed random variables

ˆ Calculate marginal probability given bivariate probability distribution

ˆ Evaluate conditional probability of a random variable given the other variable

4.4.1 Bivariate (or joint) probability distribution

Definition: Suppose (X, Y ) is a pair of real-valued functions defined on a sample space S. The
pair (X, Y ) is a bivariate random variable if both X and Y map elements in S into real numbers.

Definition: Let X and Y be random variables defined on the same sample space S. Define
event A = {(X, Y ) : a ≤ X ≤ b, c ≤ Y ≤ d} ⊂ S. Then, the pair (X, Y ) is a bivariate continuous
random variable.

Definition: Given bivariate random variable (X, Y ) on S. Let P (X = x, Y = y) be the joint


probability that X = x and Y = y, then P (X = x, Y = y) is called a bivariate probability mass
function if:
P (XP= x, Y = y) ≥ 0 for all x, y and
(a) P
(b) x y P (X = x, Y = y) = 1 and
RbRd
f (X, Y ) = P (A) = P (a ≤ X ≤ b, c ≤ Y ≤ d) = a c f (x, y)dydx is a bivariate probability
density function if:
(a) Rf (x, Ry) ≥ 0 for all real x and y such that −∞ < x, y < ∞ and f (x, y) > 0 for (x, y) ∈ A; and
∞ ∞
(b) −∞ −∞ f (x, y)dydx = 1

Examples of bivariate probability distribution


(a) Suppose X and Y have the following bivariate distribution, find P (X = 1, Y ≤ 1):

39
STA211: Foundations of Probability and Statistics 39
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES

Table 6: Bivariate distribution of X and Y

Y
0 1
1 0.25 0.25
X 2 0.25 0.25

(b) Let X and Y have the bivariate probability function f (X, Y ) = X+Y36
for X = 1, 2, 3 and
Y = 1, 2, 3, and 0 otherwise. Find P (X = 2, Y ≤ 2)
(c) Let X and Y have the distribution f (x, y) = 3x(1 − xy), for 0 < x, y < 1 and 0 elsewhere.
Find P (X ≤ 21 , Y ≤ 21 ).

Solution
(a) P (X = 1, Y ≤ 1) = P (X = 1, Y = 0) + P (X = 1, Y = 1) = 0.25 + 0.25 = 0.50.
(b) P (X = 2, Y ≤ 2) = P (X = 2, Y = 1) + P (X = 2, Y = 2) = 2+1 36
+ 2+2
36
7
= 36 .
1 R 1 1 2
(c) P (X ≤ 12 , Y ≤ 12 ) = 02 02 3x(1 − xy)dydx = 02 ( 3x − 3x8 )dx = 16
3 1
= 11
R R
2
− 64 64
.

Note: The quantity P (X ≤ t, Y ≤ s) = F (t, s) is the bivariate cumulative distribution function


of the random variables X and Y which is evaluated in the same way as in univariate probability
distributions.

4.4.2 Marginal probability distribution of a random variable

Definition: Given the random variables X and Y with bivariate probability mass function
P (X = x, Y = y) or density density function f (x, y), the
P marginal probability mass function
(respectively
P density function) of X is given by: p(x) = y P (X = x, Y = y), with p(x) ≥ 0
and xRp(x) = 1 [X,Y discrete]; and
∞ R∞
g(x) = −∞ f (x, y)dy, with g(x) ≥ 0 and −∞ g(x)dx = 1 [X, Y continuous].

The marginal probability mass of density function of Y is defined in a similar manner as of X,


by considering all values in the range of X in the summation or the integral on the right hand
side.

Definition:
P The marginal cumulative distribution function of X is given by:
F (b) = bx=x0 P (X = x, Y = y), [X, discrete] and
Rb R∞
F (b) = −∞ ∞ f (x, y)dydx, [X, continuous]

Examples of marginal distribution


Using case (a) in previous example, we have:
X
P (X = 1) = f (X, Y ) = f (1, 0) + f (1, 1) = 0.25 + 0.25 = 0.50. (19)
y

40
STA211: Foundations of Probability and Statistics 40
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES

and X
P (X = 2) = f (X, Y ) = f (2, 0) + f (2, 1) = 0.25 + 0.25 = 0.50. (20)
y

This gives us the marginal probability mass function of X and Y, respectively as:

Table 7: Marginal probability distribution of X

x 1 2 sum
p(x) 0.50 0.50 1

and

Table 8: Marginal probability distribution of Y

y 0 1 sum
g(y) 0.50 0.50 1

While from the case (c) for continuous bivariate random variables, the marginal probability
density function of X is:
Z ∞ Z 1 Z 1
3 3
g(x) = f (x, y)dy = 3x(1−xy)dy = (3x−3x2 y)dy = [3xy− x2 y 2 ]|10 = 3x− x2 . (21)
−∞ 0 0 2 2

4.4.3 Expectation, Covariance and Correlation of bivariate random variables

Definition: Let X and Y be random variables with bivariate probability function f (X, Y ), the
expectation of a function of X and Y , q(X, Y ), is:
XX
E[q(X, Y )] = q(Xi , Yj )f (Xi , Yj ), [X, Y, discrete] (22)
i j

or Z Z
E[q(X, Y )] = q(x, y)f (x, y)dydx, [X, Y, continuous]. (23)
x y

Definition: The covariance of the random variables X and Y , denoted by Cov(X, Y ) or σX Y


is:
XX
Cov(X, Y ) = E[(X −µX )(Y −µY )] = (Xi −µX )(Yj −µY )f (Xi , Yj ), [X, Y, discrete] (24)
i j

or
Z Z
Cov(X, Y )] = E[(X − µX )(Y − µY )] = (x − µX )(y − µY )f (x, y)dydx, [X, Y, continuous].
x y
(25)

41
STA211: Foundations of Probability and Statistics 41
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES

Note that through expanding the term in the expectation, Cov(X, Y ) is simplified as Cpov(X, Y ) =
E(XY ) − µX µY . The quantity Cov(X, Y ) measures joint variability of the random variables X
and Y , i.e.: (a) If the probability is high that large values of X − µX are associated with large
values of Y − µY , and small values of X − µX are associated with small values of Y − µY , then
X and Y are positively related and Cov(X, Y ) > 0,

(b) If the probability is high that large values of X − µX are associated with small values of
Y − µY and small values of X − µX are associated with large values of Y − µY , then X and Y
are negatively related and Cov(X, Y ) < 0,

(c) If the probability is high that values of X − µX will have no association with values of Y − µY ,
then Cov(X, Y ) = 0

Whereas the sign of Cov(X, Y ) indicates the direction of the relationship between the random
variables X and Y , its magnitude depends upon the units in which X and Y are measured. To
correct for the scaling of X and Y , Cov(X, Y ) is divided by σX σY , which gives the coefficient of
correlation between the two random variables.

Definition: The coefficient of correlation between the random variables X and Y , denoted by
Corr(X, Y ) or ρXY is given by:

Cov(X, Y )
Corr(X, Y ) = , −1 ≤ Corr(X, Y ) ≤ 1. (26)
σX σY
The quantity Corr(X, Y ) measures the strength as well as direction of the linear relationship
between X and Y . When Corr(X, Y ) = 1, it means X and Y have perfect positive linear
association. Whereas Corr(X, Y ) = −1 implies perfect negative association between the two
variables. When there is no linear association between the two variables, then Corr(X, Y ) = 0
(there reverse does not always hold).

Properties of Covariance
Given any two random variables X and Y on the same sample space S, and constants a, b:

1. Cov(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) + 2abCov(X, Y ),


2. Cov(aX + bY ) = a2 V ar(X) + b2 V ar(Y ) − 2abCov(X, Y ),
3. Cov(XY ) = µ2Y V ar(X)+µ2X V ar(Y )+2µX µY Cov(X, Y )−[Cov(X, Y )]2 +E[(X −µX )2 (Y −
µY )2 ] + 2µY E[(X − µX )2 (Y − µY )] + 2µX E[(X − µX )(Y − µY )2 ],

4. E( X
Y
) ≈ ( µµXY ) − ( Cov(X,Y
µ2
)
) + ( µX Vµar(Y
3
)
),
Y Y

5. Cov( X
Y
) ≈ ( µµXY )2 − [( V ar(X)
µ2
) + ( V ar(Y
µ2
)
) − ( 2Cov(X,Y
µX µY
)
)].
X Y

Examples
From the data in Table 6, Cov(X, Y ) = E(XY ) − µX µY .
Now E(XY ) = 1 × 0 × 0.25 + 1 × 1 × 0.25 + 2 × 0 × 0.25 + 2 × 1 × 0.25 = 0 + 0.25 + 0 + 0.50 = 0.75.
While µX = 1×0.50+2×0.50 = 0.50+1.00 = 1.50 and µY = 0×0.50+1×0.50 = 0.00+0.50 = 0.50.
Hence, Cov(X, Y ) = 0.75 − 1.50 × 0.50 = 0.
As for correlation of X and Y , we need V ar(X) = 12 (0.50) + 22 (0.50) − 1.52 = 2.50 − 2.25 = 0.25,

42
STA211: Foundations of Probability and Statistics 42
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES

p √
so that σX = V ar(X)
p = 0.25√= 0.5, and V ar(Y ) = 02 (0.50) + 12 (0.50) − 0.52 = 0.50 − 0.25 =
0.25, so that σY = V ar(Y ) = 0.25 = 0.5.
Therefore, Corr(X, Y ) = Cov(X,Y
σX σY
) 0
= (0.5)(0.5) = 0.
This implies that the data for the two random variables do not show any linear association. In
other words, there is no linear association or pattern we can deduce from the observed values of
X and Y .

4.4.4 Conditional probability and independence of random variables

Definition: Let X and Y be random variables with joint probability distribution f (X, Y ) and
marginal probability functions g(X) and h(Y ) respectively. Then, the conditional probability
mass (or density) function of X given Y is:
g(X|Y ) = f h(Y
(X,Y )
)
, for h(Y ) > 0.

It follows from definition of conditional probability that the multiplication theorem for probability
function of random variables is

f (X, Y ) = g(X)h(Y |X) = g(X|Y )h(Y ). (27)

Definition: Let X and Y be discrete random variables with bivariate probability function
f (X, Y ) and marginal probability functions g(X) and h(Y ), respectively. Then, the random
variable X is independent of the random variable Y if:

g(X|Y ) = g(X) and h(Y |X) = h(Y ). (28)

and
f (X, Y ) = g(X)h(Y ). (29)
These equalities must hold true for all possible pairs of X and Y values. If inequality obtains for
at least one point (X, Y ), then the random variables X and Y are said to be dependent.

The definitions of conditional expectation and conditional variance of a random variable given
the other variable follow from the original definitions of expectation and variance, as follows:
X X
E(X|Y ) = xf (X|Y ) and E(X 2 |Y ) x2 f (X|Y ), (30)
x x

and
V ar(X|Y ) = E[(X − µX|Y )2 |Y ] = E(X 2 |Y ) − (E(X|Y ))2 . (31)

Examples
f (Y =1,X=2) 0.25
From the data in Tables 6 and 7, P (Y = 1|X = 2) = g(X=2)
= 0.50
= 0.50.

To prove independence of X and Y , we must do so with all pairs and their probabilities in Tables
6 to 8, i.e. P (X = 1)P (Y = 0) = 0.50 × 0.50 = 0.25 = P (X = 1, Y = 0). This is also the case
with all other probability values of X and Y in Table 6 and their marginals in Tables 7 and 8.
Hence, we can conclude that X is independent of Y .

Session 16 Exercises

43
STA211: Foundations of Probability and Statistics 43
4.4 Session 16: Bivariate 4probability
UNIT 4:distributions
RANDOM VARIABLES AND THEIR PROPERTIES

1. Let a random experiment consist of tossing a fair coin twice. Define the random variable
X to be the number of heads obtained in the two tosses and let the random variable Y
be the opposite face outcome. Determine the points within the sample space S and the
values of X and Y on S. Next, specify the bivariate probability distribution between X
and Y . Find: (a)P (X = 2|Y = 1), (b) the marginal distribution of Y , (c) the conditional
distribution of X given Y = 0. Are X and Y independent?

44
STA211: Foundations of Probability and Statistics 44
5 UNIT 5: STANDARD PROBABILITY LAWS OF DISCRETE TYPE

5 Unit 5: Standard probability laws of discrete type

This unit discusses some common probability laws for discrete random variables.

5.1 Session 17: Discrete Uniform and Bernoulli distributions

Introduction
In this session, two basic standard probability distributions of discrete uniform and Bernoulli are
discussed.

Objectives of Session 16
After studying this session, you should be able to:

ˆ Identify a discrete uniform or Bernoulli probability distribution given a real-life problem

ˆ Calculate probabilities and summary measures given discrete uniform or Bernoulli proba-
bility distribution problem

5.1.1 The Discrete Uniform Distribution

Suppose a discrete random variable X defined on a sample space S has its range as n finite set of
numbers, such as X = {1, 2, 3, ..., n}. Suppose each value of X is equally likely to occur from S,
thus each value has same probability n1 of occurence. Then, X is said to have a discrete uniform
distribution, whose probability mass function, denoted by f (X; n) or P (X = x; n) is given by:
f (X; n) = n1 for x = 1, 2, ..., n and 0 elsewhere.

The greatest benefit of knowing a particular probability law given the data is in getting a right
mathematical expression that characterises the distribution, with which we can compute prob-
ability problems associated with the distribution. For example, if X ∼ U nif (1, 2, ..., n), where
the symbol ∼ is read as ”is distributed as”, then for any given integer 1 ≤ b ≤ n:
b
F (b) = P (X ≤ b) = , (32)
n
while
n
X
E(X) = xp(x)
x=1
n
X 1
= x.
n
x=1
n
1X (33)
= x
n
x=1
1 n(n + 1)
=
n 2
n+1
= ,
2

45
STA211: Foundations of Probability and Statistics 45
5.1 Session 17: Discrete
5 UNITUniform
5: STANDARD
and Bernoulli
PROBABILITY
distributions LAWS OF DISCRETE TYPE

and

V ar(X) = E(X 2 ) − (E(X))2


n
X n+1 2
= x2 p(x) − ( )
2
x=1
n
X 1 n+1 2
= x2 . −( )
n 2
x=1
n (34)
1X 2 n+1 2
= x −( )
n 2
x=1
1 n(n + 1)(2n + 1) (n + 1)2
= −
n 6 4
n2 − 1
= .
12

Example
Given that X has discrete uniform distribution with 6 values. Find E(X) and V ar(X).
n+1 6+1 n2 −1 62 −1 35
Solution: E(X) = 2
= 2
= 3.5. Whereas V ar(X) = 12
= 12
= 12
.

5.1.2 The Bernoulli Distribution

Consider an experiment with only two possible outcomes, dubbed success or failure. Let p be
the probability of obtaining a ’success’ in such experiment and 1 − p the probability of obtaining
a ’failure’. Define a random variable X as total number of ’successes’ that can be obtained when
such experiment is performed once. Then, X can be 0 or 1. It is 0 if the experiment ends in
giving an outcome of ’failure’, and it is 1 if the experiment gives an outcome of ’success’. Further,
X is 1 with probability p and 0 with probability 1 − p.

Such an experiment (or trial) whose outcome can only be a ’success’ or a ’failure’ is called
a Bernoulli trial, named in honour of the Swiss mathematician Jakob Bernoulli (1654-1705).
While, the random variable X that is generated by counting the number of ’successes’ in one
Bernoulli trial, which can be 1 or 0, is called a Bernoulli random variable. If X ∼ Bernoulli(p),
then its probability mass function is given by:

P (X = x; p) = px (1 − p)1−x , x = 0, 1 and 0 elsewhere. (35)

Therefore, the distribution can be characterised by the following quantities:


1
X
E(X) = xp(x)
x=0 (36)
= 0.(1 − p) + 1.(p)
= p,

46
STA211: Foundations of Probability and Statistics 46
5.2 Session 18: Binomial
5 UNIT and5:Negative
STANDARDBinomial
PROBABILITY
Distributions LAWS OF DISCRETE TYPE

and
V ar(X) = E(X 2 ) − (E(X))2
1
X
= x2 p(x) − p2
x=0 (37)
= 0 .(1 − p) + 12 .(p) − p2
2

= p − p2
= p(1 − p).

5.2 Session 18: Binomial and Negative Binomial Distributions

5.3 Session 19: Poisson distribution

47
STA211: Foundations of Probability and Statistics 47
6 UNIT 6: STANDARD PROBABILITY LAWS OF CONTINUOUS TYPE

6 Unit 6: Standard probability laws of continuous type

6.1 Session 20: Exponential and Gamma distributions

6.2 Session 21: Uniform distribution

6.3 Session 22: Normal distribution

48
STA211: Foundations of Probability and Statistics 48
7 UNIT 7: RELATING PROBABILITY LAWS

7 Unit 7: Relating probability laws

7.1 Session 23: Normal approximation of the binomial distribution

49
STA211: Foundations of Probability and Statistics 49

You might also like