Markov Chain (Part 1)

Discrete Time Markov Chain (Part 1)
Noha Youssef
The American University in Cairo

nayoussef@aucegypt.edu
1 / 31
Table of Contents
Introduction
Definitions
The Transition Matrix
The t- Step Transition Probability
Marginal Distributions
The Chapman-Kolmogorov Equations
2 / 31
Introduction: Transition Diagram
Consider this transition diagram
3 / 31
Introduction
The transition diagram above shows a system with 7 possible
states: state space
S = {1, 2, 3, 4, 5, 6, 7}.
◮ Starting from state 1, what is the probability of ever reaching

state 7?
◮ Starting from state 2, what is the expected time taken to
reach state 4?
◮ Starting from state 2, what is the long-run proportion of time
spent in state 3?
◮ Starting from state 1, what is the probability of being in state
2 at time t? Does the probability converge as t → ∞, and if
so, to what?
4 / 31
Definitions
Let X0 , X1 , · · · be a discrete-time stochastic process where each Xi

takes value in some discrete set S. The set S is called the state
space. We say that the stochastic process has the Markov property
if
P (Xt+1 = i|Xt , Xt−1 , · · · , X0 ) = P (Xt+1 = i|Xt )
for all t ≥ 0 and i ∈ S. A stochastic process with the Markov
property is called a Markov chain. Note that a finite Markov chain
can be described in terms of the transition probabilities
pij = P (Xt+1 = j|Xt = i).

P
One can easily see that j pij = 1 ∀i ∈ S.
5 / 31
Definitions
Definition 1
The state of a Markov chain at time t is the value of Xt . For
example, if Xt = 6, we say the process is in state 6 at time t.
Definition 2
The state space of a Markov chain, S, is the set of values that
each Xt can take. For example, S = {1, 2, 3, 4, 5, 6, 7}. Let S have
size N (possibly infinite).
6 / 31
Definitions
Definition 3
A trajectory of a Markov chain is a particular set of values for
X0 , X1 , X2 , · · · For example, if X0 = 1, X1 = 5 and X2 = 6, then
the trajectory up to time t = 2 is 1, 5, 6. More generally, if we refer
to the trajectory s0 , s1 , s2 , s3 , · · · , we mean that
X0 = s0 , X1 = s1 , X2 = s2 , X3 = s3 , · · · ’Trajectory’ is just a
word meaning ’path’.
Definition 4: Markov Property

The basic property of a Markov chain is that only the most recent
point in the trajectory affects what happens next. This is called
the Markov Property. It means that Xt+1 depends upon Xt , but it
does not depend upon Xt−1 , · · · , X1 , X0 .
7 / 31
Definitions
◮ The future path of a Markov process, given its current state

(Xt ) and the past history before Xt , depends only on the
current state (not on how this state has been reached).
◮ The current state contains all the information (summary of
the past) that is needed to characterize the future (stochastic)
behaviour of the process.
◮ Given the state of the process at an instant its future and past
are independent.
◮ the future is conditionally independent of the past given the
present.
8 / 31
Definitions
Definition 5
A Markov process is time homogeneous if the transition
probability, Pij = P (Xt+1 = j|Xt = i), does not depend on t
Model specification means we need to

◮ identify the possible states
◮ identify the possible transitions
◮ identify the transition probabilities
9 / 31
Definition 6: Probability of a Path
The probability of a path i0 , i1 , · · · , it is
P {X0 = i0 , · · · , Xt = it } = P {X0 = i0 }pi0 ,i1 , pi1 ,i2 · · · , pit−1 ,it
Proof
P {X0 = i0 , X1 = i1 } = P {X1 = i1 |X0 = i0 }P {X0 = i0 }

= pi0 ,i1 P {X0 = i0 }
P {X0 = i0 , X1 = i1 , X2 = i2 }
= P {X2 = i2 |X1 = i1 , X0 = i0 }P {X1 = i1 |X0 = i0 }P {X0 = i0 }
= P {X2 = i2 |X1 = i1 }P {X1 = i1 |X0 = i0 }P {X0 = i0 }
= pi1 ,i2 pi0 ,i1 P {X0 = i0 } = P {X0 = i0 }pi0 ,i1 pi1 ,i2 .
and so on · · ·
10 / 31
Probability of a Path
The sample paths of a Markov chain are completely characterized by

the one-step transition probabilities, pij , i, j ∈ S, and initial-state
probabilities, pi , i ∈ S. Given values for these probabilities, the
joint probability of any finite sample path can be decomposed into a
product of an initial-state probability and the appropriate sequence
of transition probabilities.
11 / 31
Transition Diagram
For a transition diagram each node represents a state of the system
and is numbered accordingly, directed arc connects state i to state
j if a one-step transition from i to j is possible. The one-step
transition probability pij is written next to the arc. Notice that a
transition from a state to itself is represented by a loop The transition
diagram shows the transitions between different states.
We can also summarize the probabilities in a matrix. The matrix

describing the Markov chain is called the transition matrix. It is the
most important tool for analysing Markov chains.
12 / 31
Transition Matrix
All the elements of a Markov chain model can be encoded in a

transition probability matrix.
 
p11 p12 · · · p1m
 p21 p22 · · · p2m 
P = . .. .. 
 
 .. . ··· . 
pm1 pm2 · · · pmm
Note that the sum of each row is equal to one. A matrix with
non-negative elements such that the sum of each row equals 1 is
called a stochastic matrix.
13 / 31
Transition Matrix
The transition matrix is usually given the symbol P = (pij ). In the
transition matrix P:
◮ the rows represent NOW, or FROM (Xt );
◮ the columns represent NEXT, or to (Xt+1 );
◮ entry (i, j) is the CONDITIONAL probability that
N EXT = j, given that N OW = i: the probability of going
FROM state i TO state j.
Pij = P (Xt+1 = j|Xt = i).
◮ The transition matrix P must list all possible states in the
state space S.
◮ P is a square matrix (N × N ), because Xt+1 and Xt both
take values in the same state space S (of size N ).
◮ The rows of P should each sum to 1:
◮ This simply states that Xt+1 must take one of the listed
values.
◮ The columns of P do not in general sum to 1.
14 / 31
Example 1
A machine can be either working or broken on a given day. If it

is working, it will break down in the next day with probability 0.01,
and will continue working with probability 0.99. If it breaks down on
a given day, it will be repaired and be working in the next day with
probability 0.8, and will continue to be broken down with probability
0.2. We can model this machine by a Markov chain with two states:
working, and broken down. The transition probability matrix is given
by
0.99 0.0.1
.
0.8 0.2
15 / 31
Example 2
The four possible states of a student in a music festival are
S = { “dancing”, “at a concert”, “at the bar”, “back home”}.
Let us assume that the student changes state during the festival
according to the following transition matrix:
 
0 0 1 0
 1/2 0 1/2 0 
 1/4 1/4 1/4 1/4  .
 
0 0 0 1
This Markov chain can be

represented by the
following transition graph:
16 / 31
Exercise 1
Purpose-flea zooms
around the vertices
of the transition di-
agram opposite. Let
Xt be Purpose-flea’s
state at time t (t =
0, 1, · · · ).
1. Find the
transition
matrix, P .
2. Find P (X2 =
3|X0 = 1).
17 / 31
Exercise 1:Solution
1. The transition matrix is

 
0.6 0.2 0.2
P =  0.4 0 0.6  .
0 0.8 0.2
2. P (X2 = 3|X0 ) = 1) = P (X1 = 1|X0 = 1) × P (X2 = 3|X1 =

1, X0 = 1) + P (X1 = 2|X0 = 1) × P (X2 = 3|X1 = 2, X0 =
1) + P (X1 = 3|X0 = 1) × P (X2 = 3|X1 = 3, X0 = 1),
because of the Markov Property we have
P (X2 = 3|X0 = 1) = P (X1 = 1|X0 = 1) × P (X2 = 3|X1 =
1) + P (X1 = 2|X0 = 1) × P (X2 = 3|X1 = 2) + P (X1 =
3|X0 = 1) × P (X2 = 3|X1 = 3), hence
P (X2 = 3|X0 = 1) = 0.6 × 0.2 + 0.2 × 0.6 + 0.2 × 0.2 = 0.28.
18 / 31
Exercise 2:(Ross, 2003, Example 4.3, p. 182)
On any given day, Evaristo is either cheerful (C), so-so (S), or glum
(G).
◮ If he is cheerful today, then he will be C, S, or G tomorrow
with probabilities 0.5, 0.4, 0.1, respectively.
◮ If he is feeling so-so today, then he will be C, S, or G
tomorrow with probabilities 0.3, 0.4, 0.3.
◮ If he is glum today, then he will be C, S, or G tomorrow with
probabilities 0.2, 0.3, 0.5.
Let Xn denote Evaristo’s mood on the nth day, then {Xn : n ∈ N0 }
is a three-state Markov chain (state 1 = C, state 2 = S, state 3 =
G). Identify the TPM of this DTMC.
19 / 31
The t-Step Transition Probability
Let {X0 , X1 , X2 , · · · } be a Markov chain with state space S =

{1, 2, · · · , N }. Recall that the elements of the transition matrix P
are defined as:
(P )ij = pij = P (X1 = j|X0 = i) = P (Xt+1 = j|Xt = i)
for any t.
pij is the probability of making a transition FROM state i TO state j
in a SINGLE step. what is the probability of making a transition from
state i to state j over two steps? I.e. what is P (X2 = j|X0 = i)?
20 / 31
We are seeking P (X2 = j|X0 = i). i.e. 2-step transition. Use the
Partition Theorem:
P (X2 = j|X0 = i) = Pi (X2 = j)
N
X
= Pi (X2 = j|X1 = k)Pi (X1 = k) Partition Theorem
k=1
N
X
= P (X2 = j|X1 = k, X0 = i)P (X1 = k|X0 = i)
k=1
XN
= P (X2 = j|X1 = k)P (X1 = k|X0 = i) Markov Property
k=1
XN
= Pkj Pik by definition
k=1
N
X
= Pik Pkj = (P 2 )ij rearranging
k=1
21 / 31
Recall that: Let A be an N × N matrix. Then

N
X N
X
(A2 )ij = (Aik )(Ajk ) = aik akj .
k=1 k=1
22 / 31
The t- Step Transition
3-step transition We can find P (X3 = j|X0 = i) similarly, but

conditioning on the state at time 2:
N
X
P (X3 = j|X0 = i) = P (X3 = j|X2 = k)P (X2 = k|X0 = i)
k=1
Xn n
X
= pkj (P 2 )ik = (P 2 )ik pkj = (P 3 )ij .
k=1 k=1
23 / 31
The t- Step Transition
By mathematical induction we can conclude that
P (Xt = j|X0 = i) = P (Xn+t = j|Xn = i) = (P t )ij ,
for any n.
24 / 31
Let {X0 , X1 , X2 , · · · } be a Markov chain with state space S =
{1, 2, · · · , N }. Now each Xt is a random variable, so it has a prob-
ability distribution. We can write the probability distribution of Xt
as an N × 1 vector. For example, consider X0 . Let π be an N × 1
vector denoting the probability distribution of X0 :
   
π1 P (X0 = 1)
 π2   P (X0 = 2) 
π =  ..  =  ..
   

 .   . 
πN P (X0 = N )
To predict the state of the system at time t, t ≥ 0, we need to
compute the marginal distribution of Xt .
N
X
P (X1 = j) = P (X1 = j|X0 = i)P (X0 = i)
i=1
N
X N
X
= pij πi = πi pij = (π T P )j .
i=1 i=1 25 / 31
Same method applies to the marginal distribution of X2 , condition-
ing on X0 ,
N
X
P (X2 = j) = P (X2 = j|X0 = i)P (X0 = i)
i=1
XN
= (P 2 )ij πi = (π T P 2 )j .
i=1
Theorem 1
Let {X0 , X1 , X2 , · · · } be a Markov chain with N × N transition
matrix P . If the probability distribution of X0 is given by the
1 × N row vector π T , then the probability distribution of Xt is
given by the 1 × N row vector π T P t . That is,
X0 ∼ π T ⇒ Xt ∼ π T P t .
26 / 31
The Chapman-Kolmogorov Equations
Theorem 2
The t − step transition probabilities satisfy the following equations:
N
(n+l)
X
pij = pnik plkj .
k=1
27 / 31
Proof of The Chapman-Kolmogorov Equations
(n+l)
pij = P (Xn+l = j|X0 = i)
N
X
= P (Xn+l = j, Xl = k|X0 = i)
k=1
XN
= P (Xn+l = j|Xl = k, X0 = i)P (Xl = k|X0 = i)
k=1
N
X
= P (Xn+l = j|Xl = k)P (Xl = k|X0 = i)
k=1
XN
= P (Xn = j|X0 = k)P (Xl = k|X0 = i)
k=1
XN N
X
= Pk j n Pik
l
= l
Pik n
Pkj n+l
= (P )ij .
k=1 k=1
28 / 31
Example
Consider a particle that moves randomly on vertices (sites, locations)

labelled i ∈ Z. The particle moves by taking steps of size 0, 1 or −1
as follows: if it is on site i at time n, then at time n + 1 it moves to
state i + 1 with probability pi , to state i − 1 with probability qi and
to state i with probability ri = 1 − pi − qi . Let Xn be the location
of the particle at time n. Find the one step transition probabilities.
Solution {Xn }, n ≥ 0 is a DTMC with the state space E = Z and
(one-step) transition probability
pi,i+1 = pi , pi,i−1 = qi , pi,i = ri , i ∈ E.
If ri = 0 and pi = p, so that qi = q, {Xn}, n ≥ 0 is called a simple

random walk.
29 / 31
Exercise 1 (Continued)
3. Suppose that Purpose-flea is equally likely to start on any
vertex at time 0. Find the probability distribution of X1 .
4. Suppose that Purpose-flea begins at vertex 1 at time 0. Find
the probability distribution of X2 .
5. Suppose that Purpose-flea is equally likely to start on any
vertex at time 0. Find the probability of obtaining the
trajectory (3, 2, 1, 1, 3).
Solution
3. π = {1/3, 1/3, 1/3} then
P (X1 = i) = π T P
 
0.6 0.2 0.2
1 1 1
= ( 3 3 3 )  0.4 0 0.6 
0 0.8 0.2
1 1 1
= ( 3 3 3 ).
30 / 31
Exercise 1(Continued)
[4.] The distribution of π = ( 1 0 0 )

  
0.6 0.2 0.2 0.6 0.2 0.2
πP 2 = ( 1 0 0 )  0.4 0 0.6   0.4 0 0.6 
0 0.8 0.2 0 0.8 0.2
 
0.6 0.2 0.2
= ( 0.6 0.2 0.2 )  0.4 0 0.6 
0 0.8 0.2
= ( 0.44 0.28 0.28 ).
[5.]P (3, 2, 1, 1, 3) = P (X0 = 3)p32 p21 p11 p13 = (1/3) × 0.8 × 0.4 ×
0.6 × 0.2 = 0.0128.
31 / 31

Markov Chain (Part 1)

Uploaded by

Copyright:

Available Formats

Markov Chain (Part 1)

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Markov Chain (Part 1)

Uploaded by

Copyright:

Available Formats

Discrete Time Markov Chain (Part 1)

The American University in Cairo

The Transition Matrix

The t- Step Transition Probability

The Chapman-Kolmogorov Equations

Consider this transition diagram

◮ Starting from state 1, what is the probability of ever reaching

Let X0 , X1 , · · · be a discrete-time stochastic process where each Xi

pij = P (Xt+1 = j|Xt = i).

Definition 4: Markov Property

◮ The future path of a Markov process, given its current state

Model specification means we need to

P {X0 = i0 , · · · , Xt = it } = P {X0 = i0 }pi0 ,i1 , pi1 ,i2 · · · , pit−1 ,it

P {X0 = i0 , X1 = i1 } = P {X1 = i1 |X0 = i0 }P {X0 = i0 }

The sample paths of a Markov chain are completely characterized by

We can also summarize the probabilities in a matrix. The matrix

All the elements of a Markov chain model can be encoded in a

A machine can be either working or broken on a given day. If it

The four possible states of a student in a music festival are

S = { “dancing”, “at a concert”, “at the bar”, “back home”}.

This Markov chain can be

1. The transition matrix is

2. P (X2 = 3|X0 ) = 1) = P (X1 = 1|X0 = 1) × P (X2 = 3|X1 =

Let {X0 , X1 , X2 , · · · } be a Markov chain with state space S =

(P )ij = pij = P (X1 = j|X0 = i) = P (Xt+1 = j|Xt = i)

Recall that: Let A be an N × N matrix. Then

3-step transition We can find P (X3 = j|X0 = i) similarly, but

By mathematical induction we can conclude that

P (Xt = j|X0 = i) = P (Xn+t = j|Xn = i) = (P t )ij ,

Consider a particle that moves randomly on vertices (sites, locations)

pi,i+1 = pi , pi,i−1 = qi , pi,i = ri , i ∈ E.

If ri = 0 and pi = p, so that qi = q, {Xn}, n ≥ 0 is called a simple

[4.] The distribution of π = ( 1 0 0 )

You might also like